Lecture Notes in 
Computer Science 1613 



AttilaKuba Martin §amal 
Andrew Todd-Pokropek (Eds.) 



Information Processing 
in Medical Imaging 

16th International Conference, lPMr99 
Visegrad, Hungary, June/July 1999 
Proceedings 




Springer 




Lecture Notes in Computer Science 1613 

Edited by G. Goos, J. Hartmanis and J. van Leeuwen 




Springer 

Berlin 

Heidelberg 

New York 

Barcelona 

Hong Kong 

London 

Milan 

Paris 

Singapore 

Tokyo 




V 

Attila Kuba Martin Samal 
Andrew Todd-Pokropek (Eds.) 



Information Processing 
in Medical Imaging 



16th International Conference, IPMI’99 
Visegrad, Hungary, June 28 - July 2, 1999 
Proceedings 




Springer 




Series Editors 



Gerhard Goos, Karlsruhe University, Germany 
Juris Hartmanis, Cornell University, NY, USA 
Jan van Leeuwen, Utrecht University, The Netherlands 



Volume Editors 
Attila Kuba 

Department of Applied Informatics, Jozsef Attila University 
Arpad ter 2., H-6720 Szeged, Hungary 
E-mail; kuba@infu-szeged.hu 

Martin Samal 

Institute of Nuclear Medicine, Charles University of Prague 
Salmovska 3, CZ-120 00 Prague, Czech Republic 
E-mail; samal@cesnet.cz 

Andrew Todd-Pokropek 

Department of Medical Physics, University College London 
Gower Street, London WCIE 6BT, UK 
E-mail; a.todd@ucl.ac.uk 



Cataloging-in-Publication data applied for 
Die Deutsche Bibliothek - CIP-Einheitsaufnahme 

Information processing in medical imaging ; 16th international conference ; 
proceedings / IPMI ’99, Visegrad, Hungary, June 28 - July 2, 1999. Attila Kuba 
. . . (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; 
Milan ; Paris ; Singapore ; Tokyo ; Springer, 1999 
(Lecture notes in computer science ; Vol. 1613) 

ISBN 3-540-66167-0 



CR Subject Classification (1998); 1.4, 1.2.5-6, J.3 
ISSN 0302-9743 

ISBN 3-540-66167-0 Springer- Verlag Berlin Heidelberg New York 



This work is subject to copyright. All rights are reserved, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, 
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication 
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, 
in its current version, and permission for use must always be obtained from Springer- Verlag. Violations are 
liable for prosecution under the German Copyright Law. 

© Springer-Verlag Berlin Heidelberg 1999 
Printed in Germany 

Typesetting: Camera-ready by author 

SPIN 10705157 06/3142 - 5 4 3 2 1 0 Printed on acid-free paper 




Preface 



The 1999 international conference on Information Processing in Medical Imaging 
(IPMI ’99) was the sixteenth in the series of biennial meetings and followed the 
successful meeting in Poultney, Vermont, in 1997. This year, for the first time, 
the conference was held in central Europe, in the historical Hungarian town 
of Visegrad, one of the most beautiful spots not only on the Danube Bend 
but in all Hungary. The place has many historical connections, both national 
and international. The castle was once a royal palace of King Matthias. In the 
middle ages, the Hungarian, Czech, and Polish kings met here. Recently, after 
the summit meeting of reestablished democracies in the area, it became a symbol 
for the cooperation between central European countries as they approached the 
European Union. It was thus also symbolic to bring IPMI, in the year of the 
30th anniversary of its foundation, to this place, and organize the meeting with 
the close cooperation of local and traditional western organizers. 

It also provided a good opportunity to summarize briefly a history of IPMI 
for those who were new to the IPMI conference. 

This year we received 82 full paper submissions from all over the world. 
Of these, 24 were accepted as oral presentations. These were divided into 6 
sessions. In spite of our efforts, it was found to be impossible to make these 
sessions fully balanced and homogeneous. Therefore, the session titles express 
the leading themes of the respective sessions rather than provide a thorough 
description of all papers included in each of them. 

The first session (traditionally) dealt with new imaging techniques. The top- 
ics here span from an analytical study of bioelasticity using ultrasound, to mul- 
tipolar MEG, binary tomography, and navigated surgery. The second session 
concerned image processing in three-dimensional ultrasonography and dynamic 
PET. The third and the fifth sessions presented classic IPMI topics about image 
segmentation and registration. The papers on segmentation brought new ideas 
about hybrid geometric snake pedals, geodesic active contours, adaptive fuzzy 
segmentation, and segmentation of evolving processes in three dimensions. Pa- 
pers on registration expanded both linear and non-linear approaches to elastic 
transformations and introduced hierarchical deformation models for 4-D cardiac 
SPECT data. The fourth session included a mixture of papers on segmentation 
and registration as applied to analysis of images of the brain cortex. The final 
(sixth) session dealt with feature detection and modelling. It included detection 
of masses in mammography, physiologically oriented models for functional MRI, 
comparison of MR and x-ray angiography, and a unified framework for atlas 
matching based on active appearance models. It was an explicite requirement 
of the IPMI Board, as well as the conviction of the organizers, to insist on a 
demonstration of medical applicability of all the image processing methods pre- 
sented. We believe that all the selected papers fulfill this difficult but crucial 
criterion. The time alloted to oral presentations was 20 minutes plus 30 minutes 




VI 



Preface 



for a (scheduled) discussion which, however, by IPMI tradition is virtually un- 
limited and depends on the importance of the problem, the clarity of the paper 
and the interest of the audience. In the proceedings, the space alloted to each 
oral presentation is 14 pages. It is a compromise between a need to provide the 
readers with sufficient details of the presentation and a requirement to keep the 
extent of the book to within 500 pages. The organizers regret that they could 
not accept the often justified requests of many authors to expand the space for 
their papers. 

An additional 28 submissions were accepted as poster presentations. Ample 
time was given to the audience to meet the authors in front of their posters and to 
discuss the presentations in depth. In addition to short oral presentations, each 
of two poster sessions was concluded by a plenary discussion. In the proceedings, 
the space alloted to each poster presentation is 6 pages. 

The poster presentations were divided into 2 sessions. The first dealt with var- 
ious methods of cardiovascular image analysis, modelling and analysis of shapes, 
and with the segmentation and detection of specific image structures. The sec- 
ond concerned reconstruction, measurement in medical images, registration, and 
image modelling. Although oral papers and plenary discussions form the tradi- 
tional basis of the IPMI meeting, the introduction of poster sessions further 
enlarged the space permitted for additional topics, for considering more specific 
applications, and for extended informal discussions. 

The uniqueness of the IPMI meeting has been emphasized from various per- 
sonal viewpoints in the forewords of previous proceedings. It consists in a magic 
mixture of an interdisciplinary approach, informal communication, thorough dis- 
cussions, high scientific standards, the promotion of young researchers, and a 
friendly atmosphere. It is a great responsibility for the organizers to cultivate 
the IPMI tradition and sustain all its many flavours for the future. We sincerely 
wish IPMI many happy returns for its 30th birthday and wish it well, long into 
the 21th century. 



March 1999 



Attila Kuba 
Martin Samal 
Andrew Todd-Pokropek 
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A brief history of the universe, the IPMI phenomenon 
(IPMI 1969-1999) 

The big bang from the point of view of IPMI took place in late 1969. Some 20 odd 
(some of them very odd) researchers gathered together in Brussels for an ad hoc 
meeting on the use of computers in nuclear medicine, sponsored by a grant from 
Euroatom, obtained by our first ‘president’, Frangois Erbsmann. The meeting 
was originally given the name Information Processing in Scintigraphy, and only 
Europeans participated. It is worth noting that at that time a computer with 4K 
of memory was considered respectable. By 1971, the expansion of this Universe 
had reached Hannover, under our second president, Eberhard Jahns, and by this 
time some grit from across the Atlantic had also been incorporated. From these 
first few seconds of the expanding IPMI universe, little (written) trace remains 
of the white heat of invention. The first meeting produced no written record, and 
the second proceedings only exist (but do exist) as an unpublished manuscript. 
One of the pieces of grit, present at the 2nd meeting, agreed to run the 3rd 
meeting in Boston (Cambridge), and so Steve Pizer (aided by Charlie Metz) 
permitted continuing expansion to North America. As a result of this, IPMI 
has been established of an oscillating universe with a period of 2 years with, at 
this interval, the centre of gravity switching between Europe and North Amer- 
ica. We have considered further expansion to the far east, Australia, or South 
America, but have been prevented from doing so by the strong force effect (lack 
of money). A rare photograph exists of some of the participants at the Boston 
meeting, lounging on a lawn, not wearing very much, and observing attractive 
students go by. Ah, the universe was young then! By now, the ratio of North 
American contributions had reached 50%, a value which has been maintained. 
Although scintigraphy (nuclear medicine) was still the target application, tomo- 
graphic reconstruction was considered important and a number of general image 
processing papers foreshadow a slow drift towards computer vision applications. 

Two years later, in 1975, the meeting switched to Paris (Orsay) which I ran. 
The meeting was now scheduled for a total of 5 days, with one free afternoon, 
and another long lasting phenomenon was discovered, that of the IPMI football 
(soccer) match. Despite unwarranted complaints about bias in refereeing, this 
match has always been won by the European team, and it is hoped and antici- 
pated that this strange effect will be preserved. We have also always had a few 
female scientists present at the IPMI meetings, but regrettably their charm has 
only been present in limited numbers. In Paris the IPMI universe reached the 
number of 100 participants, and a major aim of the meeting has been to try to 
limit total numbers to this order. IPMI has always permitted long presentations 
with effectively unlimited time for ensuing questions, and it has been a second 
major aim of IPMI to try to remove limits in total time to permit this disorder. 

In 1977 we arrived in Nashville under the leadership of Randy Brill, and 
the title of the meeting now changed officially to Information Processing in 
Medical Imaging (IPMI). Many other clinical applications were now included, 
such as angiography, ultrasound, and CT, with a significant component of to- 
mographic reconstruction. In 1979 we returned to (central) Paris, under the 
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direction of Robert Di Paola, the first paper being about a relatively novel tech- 
nique, magnetic resonance. The spin doctors have increased their influence at 
each successive meeting. The proceedings were published by INSERM. In 1981 
we were laid back in California, activated by the acerbic wit of Michael Goris, 
but mainly thinking about nuclear medicine and ultrasound (and Californian 
fruit and wine). A major theme of the meeting was applications in cardiology 

In 1983 we returned to Brussels, led by Frank Deconinck, and the first pub- 
lication of the proceeding by a regular publisher was produced. All subsequent 
proceedings have been published, by Martinus Nijhoff, Plenum, Wiley, Kluwer, 
and for the majority, Springer. Papers such as ‘Image analysis- topological meth- 
ods’ indicated new directions in scale space, and more substantial mathematical 
presentations. While evaluation continued to be an important topic, the meeting 
welcomed novel acquisition methods, here Impedance Tomography. In 1985 we 
passed to Washington and Steve Bacharach. While the scientific highlights of the 
meeting were significant, a couple of our Scottish participants yet again remain 
fondly in our memories as being those most responsible for the excellent social 
interactions always a feature of IPMI (here the infamous fire alarm incident) . 

We were received in The Netherlands in 1987 by Max Viergever and now 
bathed in the more abstract universe of general image processing (meta-models, 
multiresolution shape) whilst retaining our interest in reconstruction. One of 
our present chairmen gave his first paper expressing his deep angst with the title 
‘The reality and meaning of physiological factors’. As usual, the bar near our 
student accommodation remained open late in the night as the deeper notions 
of Information Processing were explored. 

As a result of the tragic death of our first chairman, it was here that the 
Frangois Erbsmann prize was established in recognition of his original intention, 
to aim the meeting towards promoting the work of young scientists (even if some 
of the lengthy questions and answer sessions do not always seem to reflect this) . 
I should also sadly point out that we have also lost our 2nd chairman, Eberhard 
Jahns, as a result of a car accident. However, I am pleased to report that as far 
as we are aware, all the rest have so far survived (despite the ravages of time 
and of our Scottish colleagues). 

Two years later we returned to California (1989), now as Berkeley ageing 
hippies (or at least some of us). MRI was now considered to warrant a whole 
session, segmentation even more, but image reconstruction was the major topic 
here. The quality of the papers had now reached a level where the competition to 
be included was such that then (and we hope now) authors reserved their best 
papers for this meeting, and braced themselves for the Spanish inquisition of 
the questions following their presentations. The final decade of the 20th century 
dawned for the IPMI universe in Wye in England, organised by Alan Colch- 
ester and Dave Hawkes (1991). The quality of the meeting seemed to have been 
maintained, as were the traditions. Multi-modality approaches appeared, MR 
was the dominant image type, and computer vision methods were emphasised. 
Posters were first introduced, but not published, at this meeting. A highlight 
was probably the sometimes violent philosophical discussions about whether an 
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edge could actually exist. On the final day, after the football match, Nico Karsse- 
meijer was obliged to present his paper in plaster, having had his leg broken, 
illustrating yet again our tremendous dedication to science. From the embrace 
of the elegant pubs in Kent, 2 years later we stormed the mountains of Arizona, 
to be precise, Flagstaff, under the leadership of Harry Barrett (1993). The meet- 
ing continued its traditions, ranging from discussions of higher order differential 
structures to optical tomography. The arguments about segmentation continued, 
we were somewhat on edge and tetchy (a skeleton in our cupboard?), but the high 
point was reached by those climbing to the top of the nearby Humphrey’s peak 
(3850m). The highlight of our next meeting, in Brest directed by Yves Bizais, 
was certainly the student celebration of their end of term where a group of them 
promenaded with loud drumming throughout the night. Fortunately, this did 
not worry everyone, as the student bar rarely closed before dawn. More posters 
were presented and now included in the proceedings. This excellent meeting was 
followed by that organised by Jim Duncan in Poultney, Vermont. The scientific 
quality was again considered to be excellent, and the surroundings beautiful. 
Neuroscience here clearly dominated other clinical applications. Despite this in- 
creasing interest in brains, somehow (again!) the Europeans won the football 
match. During the outing to a ski-resort a number of participants found refuge 
from the plague of insects, against instructions, by skiing down an icy ski-run 
(exceptionally open in June in a heat wave!). Jim Duncan as (to date) our last 
chairman has said how much he appreciated the cooperation and respect given 
to him by the enthusiastic participants at an IPMI meeting. At least he did not 
have to rescue any from jail as I have had to in the past. 

I do not know what will be the highlights of the current meeting for which 
these proceedings represent the written trace. I hope that the scientific expansion 
of the meeting will continue, and that in the social context, we will also continue 
in the long tradition of IPMI to enjoy ourselves, have fun, and make many 
new friends. The proceedings of this meeting only reflects a small part of the 
value of the IPMI experience. The length of time allocated for questions and 
answers after presentations is an important part of the IPMI experiment, but 
unfortunately is not recorded (perhaps fortunately in some cases). A new ‘Special 
Prize for Brilliance’ has been suggested. The ability to discover and discuss new 
approaches in depth is just as important, which has always been the justification 
for limiting the total number of participants. 

This brief and certainly biased history of the IPMI universe has not men- 
tioned the first presentations of some very significant results, nor included the 
names of all the co-chairmen of the meetings, and especially the names of all 
the participants without whom the meetings could never have happened or been 
successful. Let us hope that the strange charm of the meeting will persist (where 
you can find a GUT in a TOE), with its ups and downs, strung together in the- 
ory, without loss of colourful traditions. Can this be maintained? This question 
is perhaps the big crunch (or is that the result of the next football match)? 

March 1999 Andrew Todd-Pokropek 
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1987 10th IPMI, Utrecht, NL: 

John M. Gauch 

Dept, of Computer Science, University of North Carolina, Chapel Hill, NC, USA 
Gauch, J.M., Oliver, WR, Pizer, SM: Multiresolution shape descriptions and 
their applications in medical imaging. In Information Processing in Medical 
Imaging. Eds. de Graaf, CN, Viergever, MA, Plenum, New York (1988) 131- 
149 

1989 nth IPMI, Berkeley, CA, USA: 

Arthur F. Gmitro 

Dept, of Radiology, University of Arizona, Tucson, AZ, USA 
Gmitro, A.F., Tresp, V., Chen, Y., Snell, R., Gindi, G.R.: Video-rate reconstruc- 
tion of CT and MR images. In Information Processing in Medical Imaging. Eds. 
D.A. Ortendahl, J. Llacer., Wiley-Liss. New York (1991) 197-210 

1991 12th IPMI, Wye (Kent), UK: 

H. Isil Bozma 

Dept, of Electrical Engineering, Yale University, New Haven, CT, USA 
Bozma, H.I., Duncan, J.S.: Model-based recognition of multiple deformable ob- 
jects using a game-theoretic framework. In Information Processing in Medical 
Imaging. Eds. Colchester, A.C.F., Hawkes, D.J., Springer, Berlin (1991) 358-372 

1993 13th IPMI, Flagstaff, AZ, USA: 

Jeffrey A. Fessler 

Division of Nuclear Medicine, University of Michigan, Ann Arbor, MI, USA 
Fessler, J.A.: Tomographic reconstruction using information- weighted spline 
smoothing. In Information Processing in Medical Imaging. Eds. Barrett, H.H., 
Gmitro, A.F., Springer, Berlin (1993) 372-386 

1995 14th IPMI, Brest, France: 

Maurits K. Konings 

Dept, of Radiology and Nuclear Medicine, University Hospital Utrecht, Utrecht, 
The Netherlands 

Konings, M.K., Mali, W.P.T.M., Viergever, M.A.: Design of a robust strategy to 
measure intravascular electrical impedance. In Information Processing in Medical 
Imaging. Eds. Bizais, Y., Barillot, C., Di Paola, R., Kluwer Academic, Dordrecht 
(1995) 1-12 

1997 15th IPMI, Poultney, VT, USA: 

David Atkinson 

UMDS, Radiological Sciences, Guy’s Hospital, London, United Kingdom 
Atkinson, D., Hill, D.L.G., Stoyle, P.N.R., Summers, P.E., Keevil, S.F.: An aut- 
ofocus algorithm for the automatic correction of motion artifacts in MR im- 
ages. In Information Processing in Medical Imaging. Eds. Duncan, J., Gindi, G.. 
Springer, Berlin (1997) 341-354 
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Abstract. A framework is presented for designing and evaluating bioe- 
lasticity imaging systems. 



1 Introduction 



Manual palpation has been an essential technique for diagnosing disease since 
the time of the ancient Greeks. They found that by compressing the surface 
of the skin a stress field is created inside the elastic tissues of the body that 
can be sensed by the fingertips. Regions atop stiff objects like cancerous lesions 
produce a greater restoring force at the skin surface than do adjacent regions. 
Hence, abnormalities may be detected and, in some cases, identified and sized 
based on their elasticity. The clinical success of manual palpation is based on the 
high elasticity contrast that exists for many pathologies - orders of magnitude 
for some cancers Q - producing intense stress fields that make it easy to detect 
surface lesions. Unfortunately those stress fields decay rapidly with distance from 
the lesion, so it is difficult to sense objects deep in the body. 

Elasticity imaging is palpation by remote sensing. It is the name for a class 
of techniques used to visualize tissue stiffness with a sensitivity and spatial res- 
olution much greater than manual palpation. Often local elastic properties are 
imaged using ultrasonic or magnetic resonance signals to track local movements 
in mechanically stimulated tissues Il2ldl4l5l . We use ultrasound to track the mo- 
tion produced during static compression ISI7I5I . Two sets of radio-frequency echo 
signals are recorded from a region in the body before and after applying a small 
compressive force. The two echo fields are compared using a series of correlation 
techniques to register the data and thereby estimate displacement in one, two, 
or three dimensions depending on the boundary conditions for motion and the 
dimensionality of the echo fields. Spatial derivatives of the displacement field 
are combined to estimate strain tensor components that we call strain images. If 
the stress field is approximately uniform, then strain is inversely proportional to 
elasticity, and strain images describe tissue stiffness directly. The key to elasticity 
imaging is precise displacement estimation at high spatial resolution. 

Ostensibly the procedure for creating strain images is straightforward, but 
in practice achieving high-quality images requires great attention to detail. We 
must seek a careful balance between three experimental conditions: high wave- 
form eoherence and accurate displacement estimation are required for low noise 
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and superior spatial resolution, and a large applied compression yields high strain 
contrast. Conditions resulting in high strain contrast often produce severe decor- 
relation noise, i.e. strain noise caused by the inability of the image formation 
algorithm to track motion when there is low coherence between pre- and post- 
compression echo fields. A balance is achieved by carefully selecting the applied 
stress, boundary conditions, ultrasonic system parameters, and signal processing, 
none of which are independent. Thus far, the designs of most elasticity imaging 
experiments are empirical. Comprehensive analyses provided by the time-delay 
estimation literature |S| are of limited value because, unlike most radar and sonar 
applications, ultrasound echo signals are stochastic and the spatially-spread scat- 
terers move in three dimensions when tissue is deformed. 

This paper briefly summarizes a maximum-likelihood (ML) strategy for ultra- 
sonic strain image formation and outlines a new approach for evaluating experi- 
mental designs. The evaluation is based on the Fourier crosstalk matrix concept 
originated by Barrett and Gifford m for designing medical imaging systems. 
We describe two mathematical models of ultrasonic waveforms recorded from a 
deformed object. A continuous model leads to the ML approach to strain imag- 
ing. A discrete model leads to the crosstalk matrix. The paper concludes with 
applications of the crosstalk matrix to the evaluation of system design. 



2 Continuous Waveform Model 



Biological tissues are modeled as incompressible, viscoelastic materials contain- 
ing randomly positioned point scatterers. The object function that describes 
the spatial distribution of scatterers is the acoustic impedance held, z(x), a 
zero-mean, Gaussian random process. The three-space coordinate vector is x = 
(xi, a; 2 , X 3 )*, where x* is the transpose of x. A shift-invariant sensitivity func- 
tiorfl h(x) maps the object function ^(x) into the echo data r(x) over a region 
of support S according to the convolution equation 



r(x) = 



J dx' h(x — x') z(x') 



no(x) 



= r(x) -I- no(x) . 



( 1 ) 



The additive noise process rio(x) is signal independent, zero-mean, band-pass 
white, and Gaussian with power spectral density G„, i.e., 

G{no(x) z(x)} = 0 , A{no(x)} = 0, A{rio(x) rio(x')} = i5(x - x') , 

where E{fg} is the expected value taken over all / and g. We assume a 2-D echo 
held from a linear array transducer. An echo held is a collection of waveforms 

^ Sensitivity functions combine the pulse-echo system response with two frequency- 
dependent functions that describe scattering and absorption in the medium. If the 
system response function is Gaussian, the Fourier transform of the sensitivity func- 
tion, (0, is approximately Gaussian m- 
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recorded from parallel transducer beams oriented along the coordinate axis Xi. 
Adjacent waveforms are arranged parallel to X 2 - The scan plane is located at 
(xi,X2,0) (Fig.jTI). Before compression (Fig.^), the object is scanned to record 
a precompression echo field defined by ©. 

After compression (Figs.^Jo and m, the same object region is re-scanned to 
find the postcompression echo field, 



ri(x) = 



f dx' /i(x — x') z{A ^x' — To) 

Js 



ni(x) 



= ri(x) -I- ni(x) , 



( 2 ) 



where the physical deformation of the object is reflected by a coordinate transfor- 
mation of the object function z(x). In modeling ri(x), we assume the movement 
of scatterers within all or part of the compressed object can be accurately de- 
scribed as an affine transformation of the scatterer positional coordinates. 
Specifically, we use the material m or Lagrangian H3 description of motion: 
if X and x are the pre- and postcompression coordinate vectors, respectively, 
then x(x) = A(x -|- Tq), where A is a linear transformation matrix and Tq is 
a displacement vector. A”^ exists, its determinant det A is approximately one, 
and it is straightforward to interpret A in terms of strain, s, when the applied 
compression is small. 



3 Image Formation Algorithms 

For example, the top surface of the object in Fig. El is uniformly displaced 
in Fig. ^3 along the direction of the ultrasound beam axis, xi, corresponding 
to an average downward displacement and scaling transformation of the object 
coordinates with non-zero components An = 1 — s and A 22 = A 33 = (1 — 

~ 1 -I- s/2. A finite-element algorithm (FEA) was used to compute the axial 
displacement field (Fig.ini) and longitudinal strain (Fig. le). Longitudinal veievs 
to the component of the strain tensor in the direction of the applied force 
in this case along xi, while axial is the direction parallel to the ultrasound beam 
axis, also along x\ in this example. Bright areas in the two images indicate large 
displacements (in the frame of the moving transducer!) and large strains. 

A second example is illustrated in Fig. Ct, depicting a nonuniform displace- 
ment of the object surface along xi at a shear angle j3 = 5.7°. The average 
deformation (Fig. [I^) is represented as a combination of displacement, scaling, 
and shearing with non-zero matrix elements An, A 22 , and A 12 = tan/3. Non- 
zero, off-diagonal elements of A indicate shearing and rotation. 

Of course, the average transformation cannot adequately describe the com- 
plex deformation of a large region in elastically heterogeneous media. Notice 
there are variations in the strain field near the stiff inclusion and at the edges 
where the top and bottom surfaces were not allowed to slip. Consequently echo 
fields are partitioned into small regions, such as the triangular meshes in Fig. [Q 
and A and Tq are estimated for each region using companding |S| or warping 
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with deformable mesh algorithms uncn]. This particular tissue-like phantom 
was built ini and scanned at 5 MHz El to produce the experimental strain 
image of the stiff inclusion shown as an insert in Fig. m- 

Several image formation algorithms have been proposed to estimate strain 
under challenging experimental conditions. Most are based on least-squares tech- 
niques, e.g., block matching deformable mesh it™ and filtered correla- 

tion nms. The least-squares approach involves ML estimation through the use 
of a wide-hand ambiguity function, A, which aims to determine all motion fea- 
tures simultaneously. In some cases, these estimators are efficient and unbiased 
m- An expression for the wide-band ambiguity function is 

/ OO 

dx ri (x) Tg (x) , where ro(x) = r(B^^x — t;,) . (3) 

-OO 

B and Tb are the linear transformation matrix and displacement vector applied 
to the precompression echo field r(x) to match the physical deformation of the 
object as it is modeled using A and r^. Notice that m is a multi-dimensional 
representation of the correlation between rg (x) and ri (x) for various values of B 
and Tb- With our current algorithm m and 2-D echo fields, B and Tb provide 
a total of six motion parameters. The algorithm computes A for the data within 
the triangular subregions of Fig. Q] and searches for a peak value, similar to the 
use of a cross correlator to estimate time delay. Parameter values at the peak 
become the estimates. In principle, an image formation algorithm based on the 
wide-band ambiguity function will achieve strain images with the lowest noise. 
In addition, joint estimates of B and Tb could determine the entire strain tensor 
resulting from the applied compression. 

4 Definitions 

Important familiar functions are stated below in the current notation. 

Fourier-series coefficient estimates Rjk for the jth 2-D echo field and the kth 
spatial frequency are m 

Rjk = for j = 0, 1 . (4) 

The wavevectors define points on an infinite 2-D grid PI|. For convenience, 
the two integer indices required to define the grid are lumped into a single index 
k = that enumerates all N frequency points within S. S' , fgdx is 

the measure of S. The Fourier transform of rj(x) is related to the Fourier-series 
coefficients by the expression Rj(u) = lim^/^oo S' Rjk- 

Applying the shift and scaling theorems ISI, the forward Fourier transforms 
of cg(x) and ri(x) are, respectively, 

i?o(u) = j^{rg(x)} = detB [iL(B*u) Z(B*u)-k Ag(B*u)] and 

i?i(u) = lF{ri(x)} = det A iJ(u) Z(A*u) -I- A^i(u) , (5) 
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(d) (e) (f) (g) 



Fig. 1. Uniform compression (a,b) generates displacement (d) and strain (e) fields. 
Shear compression (c) produces (g). FEA simulations in (d-g); phantom image in (f) 



where Z{\i) = iF{z(x)}, Nj{\i) = J^{rij{x)} and u = (■ui,U 2,M3)* is the continu- 
ous spatial-frequency vector corresponding to the spatial coordinate x. Further- 
more, we model the Fourier transform of the 2-D sensitivity function as 



H{u) = .F{/i(x)} = Couf exp(— a(Mi)) 

X [exp (— 27r^(rti — — exp (— 27 t^(ui -I- uo)^Fi)] exp (—271^142^2) (6) 

at carrier frequency uq. Li and L 2 are pulse and beamwidth parameters, Co is 
a constant, and a{ux) describes attenuation losses (Ilj. 

The cross power spectral densities for continuous and discrete frequencies are 

= S'^ E{Rl^R^y} — . = G{i?*(u)i?i(u)} , (7) 

o' ,N—^(x> 

the autospectral densities are Groro(u) = G{|i?o(u )P}> G^iri(u) = £'{|i?i(u)|2}, 
and the magnitude squared coherence function is 



l7rori(u)P 

where 0 < 17^0^1 (u)p < 1 1^. 



G'roro(u)G 

riri (u) 



( 8 ) 



5 Maximum-Likelihood Estimation 

Displacement. The ML estimator for displacement selects t to maximize the 
value of the likelihood function p(R|0) P2]- 0 is a vector of all deterministic 
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parameters that affect the data, viz., the motion parameters A, B, and 
Tb and waveform parameters i^(u), Gzz(u), and G„„(u). Ignoring all terms 
independent of displacement, the log-likelihood function is El 

/ oo 

duRe|i?S(u)i?i(u) I , (9) 

-OO ^ ^ 

where Re{- • • } is the real part of the argument and 

W\Mk) = ■ 

yjGrQVQ K)G riri (Ufc) (1 - |7 

rori (u/c)P) 

is a frequency filter. 

The ML strategy for displacement is defined by the generalized cross corre- 
lator P3| of (0). First, choose values for B and r& that warp the precompression 
echo field to match the physical deformation defined by A and r^. Specifically, 
choose Bt;, = Axa so that the value of the exponential factor in the brackets of 
(0 is one at all frequencies. Second, the ML estimator filters the pre- and the 
postcompression waveforms each with the function kF(u). Filtering increases the 
weight of the most coherent frequency components between rp and r\ but only 
if the statistical properties of the waveform are known a priori. Third, the esti- 
mator cross correlates the filtered rg and ri and finds the displacement estimate 
r at the peak cross correlation value. The main difference between the gener- 
alized cross correlator and the wide-band ambiguity function is that the former 
estimates parameters sequentially and the latter simultaneously. 



Strain. The total displacement, v, at each position in the deformed medium is 
given by the sum of the displacement applied during the echo field warp, Tb, and 
that determined from the cross correlation measurement, t, i.e., v = t + Tb- 
The Eulerian strain tensor is H2| 



^mn 



dVn 

dx„ 



dVjn 

dXn 



( 10 ) 



and the longitudinal strain measured along the axis of the sound beam is 



s , eii 



dvi 

dxi 



( 11 ) 



For imaging, strain is adequately approximated using difference formulas |3E|. 



6 Variance Bounds 



From the likelihood function of Q and the strain estimate of ED, we computed 
the Cramer-Rao lower bound on strain variance for unbiased estimates to find 



m 



2(^22^! -I- A 22 Y 2 ) 

TiAT (All A22 - A12A21)" EiFa 



var(s) > 



( 12 ) 
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where Ti is the length of the data segment and AT is the distance between over- 
lapping data segments. These experimental parameters are selected when using 
the difference equation to approximate the derivative in Cl). AT is particu- 
larly significant since it determines the axial pixel size of the strain image. A^n 
are components of the transformation matrix A and Y\ and I2 are frequency 
integrals 



Yi , 25 ' 




du (27rui)^ 



l7ron(u)P 

(l-|7ror,(u)P) ■ 



Equation ( 03 ) describes how longitudinal strain errors are affected by motion in 
the plane - motion that includes scaling, shearing, and rotation. 

We predicted strain errors from the variance bound of m- These were com- 
pared with standard deviations of strain measurements obtained from a tissue- 
like phantom acquired under nearly ideal conditions El to assess estimation 
efficiency (Fig. n. An elastically homogeneous material was uniformly com- 
pressed, as in Fig. ct, while strain was measured in a small region near the 
center. Applied strains in excess of 1% generated significantly more error than 
that predicted because the predictions ignore the effects of echo-data sampling. 
The log-likelihood function was derived assuming a continuous waveform model 
with large object support. In reality, data are sampled at different rates along 
xi and X 2 ' typical sampling intervals for a linear array are Axi = 0.015 mm and 
^Ax 2 ^ 0.180 mm. So the predictions of (H3 are not limited by the effects of 
aliasing or small data sets used to estimate displacement at high resolution. A 
continuous model led to error bounds, but to obtain a more realistic evaluation 
of elasticity system design, including the limitation of spatial resolution, we turn 
now to a discrete waveform model and an analysis of Fourier crosstalk. 



7 Discrete Waveform Model 



The process of imaging a continuous object z{x) with a linear system character- 
ized by the sensitivity function hm (x) to produce the mth discrete measurement 
sample in noise is represented by m 



rn, — 



dx hm{x) z{x) 



(13) 



M measurements are recorded from signals generated within the object support 
S such that — (M — l)/2 < m < (M — 1) /2 where, for convenience, M is an odd 
integer. The discussion is limited to one spatial dimension for simplicity. The 
object function is exactly represented by the Fourier series m 

00 

z(x)= S{x) , 

i— — oo 



(14) 
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where are Fourier coefhcients and are Fourier basis functions. In our 

one-dimensional example, S{x) = rect(cc/iS') and U£ = I/S' . Integer indices m 
and I indicate digitized waveform samples and frequencies, respectively. 
Combining and da, the noise-free echo measurements are m 

oo 

r™ = E Z£ = , where <I'm£ = 

i— — OD 

components of an M x oo matrix whose rows are the Fourier transforms 
of the product of the sensitivity function and support function for each mea- 
surement. Equation da expresses the mth waveform sample as the sum over 
frequency of the components weighted by their respective Fourier coeffi- 
cients. The object contributes to through Zi, while the measurement process 
contributes to through 

Example: Ideal Ultrasonic Imaging System. Consider the perfect, linear, shift- 
invariant (LSI) system with sensitivity function 

I 

hm{x) = S(x - mAx) . 

27tzq clx^ 

For large support, (HSJ gives 

/ dx — S(x - mAx) 

2^20 7-00 dx^ ^ ’ \ 

1 / d'^z{x)\ 

2TTZn V dx'^ ) ’ 

which is precisely the acoustic scattering function of the object, i.e., the second 
derivative of the relative impedance profile z{x)/ zq. Images from this ideal ultra- 
sonic system reproduce the object function without distortion. We have assumed 
that the sampling interval satisfies the Nyquist criterion; specifically, \i Zi = Q 
for £ > N, then Ax < S' /N. The row (column) vectors of 'P are orthogonal for 
the ideal system, and their components are second derivatives of the object basis 
functions. 




dx h, 



,{x) 



( 15 ) 



To study displacement or strain, we must analyze the relationship between 
two ultrasonic echo fields. A discrete representation of those fields is 



ror. 



00 r « 1 00 

= Y, Zi B dx hm{,x) ^ Zi 

&— — 00 L J 



00 r « 1 CXD 

rim ^ ^ 

fc-00 J 



Warped Pre 

( 16 ) 

Elmt Zl . Post 



If the support is large compared with the range of motion, then the integrals 
over S before and after the coordinate transformation are approximately equal. 
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Equation expresses an important symmetry: imaging a deformed object is 
mathematically equivalent to imaging the undeformed object with a deformed 
sensitivity function. 



8 Crosstalk Matrix 

Following the ML strategy of (E|, we produce Tq by warping r to match the 
physical deformation in ri JED. The echo fields are then filtered and cross cor- 
related. A discrete form of the cross correlation function (j)q at lag index q for a 
particular Fg and Fi is 

(M — 1)/2 oo oo 

E{'Pq}n\z = ^ ^ '^Om ri(m+ij) = ^ ^ ^ ^ PqW ) 

m=-(M-l)/2 e=ie' = l 

where the ensemble average is over all noise realizations for a specific object, 
and 



(M-l)/2 

(iqW j ^ ^ ^Omi '^l{m+q)t ( 18 ) 

is the crosstalk matrix for ultrasonic displacement estimation. For an LSI system, 
hm{x) = h{x — mAx). Hence, a general form of the crosstalk matrix is 



(M-l)/2 

!3qU- = X! 

m=-(M-l)/2 



B H*{ue)e 



— i27TU£ - 



H 



(t) 



B H- 






o-i27TU^/(Ta-2^) 



M 



gj 27 i-«f, ( -r„) 

sincM Aa; 

sine (f - Ax 



(19) 



Frequencies ui refer to the warped precompression data while Uf refer to the 
postcompression data. We used the Dirichlet kernel in the expression above: 



(M-l)/2 

E 



^2-Kmy 



m=-(M-l)/2 



, ,sinc My 

M 

sme y 



sm -Kx 

where sme x , 

TTX 



The crosstalk matrix predicts the coherence between rg and ri for an ex- 
periment independent of the object, and therefore provides the comparison we 
require for evaluating alternative experimental designs. In general, the matrix is 
complex with three indices: i and i' are over frequency and q is over space. 

The first factor on the right-hand side (II 9|l is a band-pass filter; the diag- 
onal elements of the matrix, are the system transfer function that defines 
the sensitivity and spatial resolution of the ultrasound system for estimating 
displacement. 
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The second factor in (Unj is the phase shift corresponding to the object 
displacement. When the correlation lag qAx equals the scaled translation of 
the object Ara-, displacement estimates are accurate, the second factor is unity, 
and the crosstalk matrix is real. As with the Cramer-Rao bound, we assume 
qAx = Axa to focus on the two frequency dimensions of the crosstalk matrix, 
Pw = /SqW' l 5 =Ara/z\x- most important use of the second factor is to describe 
the effects of estimating a discrete displacement value qAxjA when in fact the 
true displacement Ta is continuous. 

The third factor in m is the crosstalk between Fourier components of Tq 
and ri. Under-sampling and incomplete warping generate non-zero off-diagonal 
components that indicate energy from frequency channels in Tq is not being 
placed into the same frequency channels in ri. Increased crosstalk is exactly 
what is meant by a loss of waveform coherence. 



9 Examples of Strain Imaging 

Using Pu', the Gaussian system response function of (0, and applying the ML 
strategy of (j2|), we can predict realistic consequences for strain noise of using 
sampled ultrasonic waveforms. Figures 0 and E| illustrate j3 for a typical 5 MHz 
linear array configuration when there is no object deformation. Figure 0 illus- 
trates the crosstalk matrix along the axis of the ultrasound beam x \ , where the 
echo field consists of a set of band-pass signals whose spectra are peaked at 
±5 MHz. Figure 0 is an image of the crosstalk matrix along the axis X 2 that 
is perpendicular to the beam axis, where the echo field consists of base-band 
signals peaked at 0 MHz 0 Figures 0 and 0 are analogous to 0 and 0 except a 5% 
scaling deformation was applied to the former along x\. Each quadrant of Figs. 
0 and 0 is a copy of the others except for polarity. Off-diagonal components for 
band-pass signals (Figs. 0 and 0 ) are not crosstalk. These are cross terms from 
the correlation at = ±5 MHz. 





Fig. 2. Strain errors: pre- Fig. 3. Axial crosstalk Fig. 4. Lateral crosstalk 
dieted and measured (•) matrix, no deformation matrix, no deformation 



^ Bandwidth was extended to facilitate clearer comparisons between Figs. 0and0 
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The 41 X 41 element matrices depicted in these figures include 20 negative 
frequencies, 20 positive frequencies, and U{, = ue> = 0 at the center. System 
parameters were selected to represent broad-band pulsed transmission with a 
5 MHz carrier frequency from a Gaussian- weighted aperture. Waveforms were 
sampled at the Nyquist rate. Examining any quadrant of the band-pass matrix 
(Fig. 0 or the entire base-band matrix (Fig. we find no crosstalk (non-zero, 
off-diagonal elements) because the ultrasound data are adequately sampled and 
there only rigid-body translation. This highly-coherent system is limited only 
by bandwidth and additive noise. Sensitivity is low only for motion of small 
scattering structures detectable at the highest spatial frequencies. 

Figures Eland El show Re{/3«/} for the same measurement system but with 
uncompensated deformation: An = 0.95, A 22 = 1.05, A 12 = A 21 = 0 and 
B = I. The effect of axial compression on the axial crosstalk matrix (Fig.|3) is 
to “rotate” the matrix patterns counterclockwise about the origin. Rotation is a 
consequence of the third factor in (II Hence the greatest loss of coherence, i.e., 
the most energy removed from the diagonal, occurs at the high frequencies. The 
concomitant lateral expansion is seen in the lateral crosstalk matrix (Fig. EJ 
as a clockwise rotation of the patterns about the origin. Deformation creates 
low-amplitude crosstalk that appears as ringing in Figs. El and El from the loss 
of matrix orthogonality. Normally, patterns along any row or column are given 
by the ratio of sine functions in II I t)ll . With a constant sampling interval, the 
frequency components are naturally orthogonal since the harmonic frequencies 
occur at zeros of the sine function. In summary, the crosstalk matrix shows us 
that uncompensated deformation reduces waveform coherence by (a) misplac- 
ing information along the matrix diagonal, (b) disturbing frequency component 
orthogonality, and (c) aliasing signal components just beyond the Nyquist fre- 
quency (see upper-right and lower-left corners of Fig. E|). Crosstalk is eliminated 
by accurate warping, viz., Atq = Bxb. 

The diagonal of the crosstalk matrix may be used to quantify spatial reso- 
lution for displacement estimates. We plotted diagonal components of the axial 
crosstalk matrix in Fig. Q for 0%, 2.5%, and 5% axial compression. The plots 
show that deformation reduces the effective bandwidth for displacement estima- 
tion and that high spatial frequencies are preferentially lost. Warping, however, 
only partially restores the lost resolution. The full matrix shows the location of 
energy missing from the diagonal. 

While the crosstalk matrix reveals many aspects of strain imaging physics, it 
is convenient to develop a scalar quantity that summarizes design performance. 
We propose the trace of the crosstalk matrix divided by its L 2 norm as that 
figure of merit: 



We computed (121 III for the longitudinal strain images shown in Fig. 0 to see if 
B correlates with visual impressions of image quality. Each echo field was simu- 




( 20 ) 
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Fig. 5. Axial crosstalk, Fig. 6. Lateral crosstalk, 
5% axial compression 5% axial compression 



Fig. 7 . Crosstalk matrix 
diagonals for three com- 
pressions 



lated by combining an ultrasound waveform simulator with a FEA as described 
previously p. Relative to the tissue-like background, the bright circles are soft 
and the dark circles are hard. The object was “scanned” with linear array trans- 
ducers having different point spread functions (psfs) whose envelopes are shown 
enlarged near the upper right corner. Along xi the same Gaussian pulse length 
was applied for all, but the aperture function varied giving different psfs along X 2 ■ 
We used a rectangular aperture in Figs.lSt^-c with the corresponding f- numbers 
f/1.5, f/3.0, and f/4.5. In Fig. 01, the f/3.0 aperture of 8b was apodized using 
a Hanning function weighting to supress side-lobes at the expense of the main 
lobe width. 

The object was scanned, compressed 3%, and rescanned, as in Fig. lb, to 
generate the strain images of Fig.0 Noise was greatest in Fig. Et, which had the 
most weakly focused beam, and least in Fig. 8a, which had the most strongly 
focused beam. Axial shear occurring near the inclusions reduces waveform co- 
herence more for wide beams than narrow. The rank order of B, shown in Fig. 0 
tracks the visual impression of image quality and such quantitative measures as 
mean-square error. Consequently, the quickly-computed B correctly predicted 
that decorrelation noise is minimized with the most focused beams. Figure 0 is 
a realistic simulation describing how the crosstalk matrix enables designers to 
explore the physics of strain imaging and provide summary measures of design 
quality. 

10 Summary 

The ML estimator for displacement is consistent with the well-known gener- 
alized cross correlator for time-delay estimation 123!. Another implementation 
is the wide-band ambiguity function m, which estimates all motion parame- 
ters simultaneously. Comparisons of measured variances with the Cramer-Rao 
lower bound showed that our implementation of the ML estimator using sampled 
waveforms was not efficient for compressions greater than 1% (Fig. EJ. A dis- 
crete waveform model was developed to formulate the Fourier crosstalk matrix 
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Fig. 8. Simulated strain images for different ultrasonic point spread functions shown 
at upper right. Four stiff (dark) and soft (bright) targets are included. Values of B, 
I2UI . are shown 



for ultrasonic strain imaging using sampled waveforms, and thus obtain a realis- 
tic means for evaluating experimental system designs from first principles. The 
crosstalk matrix was found to provide important new insights into experimental 
design, e.g., the diagonal of the matrix evaluated at the true displacement value 
is a rigorous measure of spatial resolution for displacement and strain estimation. 
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Abstract. We describe the use of truncated multipolar expansions for 
producing dynamic images of cortical neural activation from measure- 
ments of the magnetoencephalogram. We use a signal-subspace method 
to find the locations of a set of multipolar sources, each of which repre- 
sents a region of activity in the cerebral cortex. Our method builds up 
an estimate of the sources in a recursive manner, i.e. we first search for 
point current dipoles, then magnetic dipoles, and finally first order mul- 
tipoles. The dynamic behavior of these sources is then computed using 
a linear fit to the spatiotemporal data. The final step in the procedure 
is to map each of the multipolar sources into an equivalent distributed 
source on the cortical surface. The method is demonstrated through a 
Monte Carlo simulation. 



1 Introduction 

Magnetoencephalography (MEG) data are measurements of the magnetic fields 
produced by neural current sources within the brain. The problem of estimating 
these sources is highly ill-posed due to the inherent ambiguities in the associated 
quasistatic electromagnetic inverse problem, the limited number of spatial mea- 
surements and significant noise levels. To overcome these problems, constraints 
can be placed on the location and form of the current sources. Mapping studies 
using direct electrical measurements, fMRI and PET reveal discrete focal areas 
of strong activation within the cortex that are associated with specific cognitive, 
sensory and motor activities. Consequently, a plausible model for the current 
generators in an event related study consists of a number of focal cortical re- 
gions each of which has an associated time course [E|. The MEG inverse problem 
requires estimation of the spatial and temporal characteristics of these sources. 

There are two major classes of methods for solving the MEG inverse problem 
which we will refer to as “imaging” and “model based.” The imaging methods 
typically constrain sources to a tessellated representation of the cortex, assume 
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an elemental current source in each area element, and solve the linear inverse 
problem that relates these current sources to the measured magnetic field. Ac- 
curate tessellations of the cortex require on the order of 10® elements. Since 
the maximum number of MEG sensors in the current generation of whole head 
MEG system is approximately 300, the problem is highly under determined. By 
using regularized linear methods based on minimizing a weighted L 2 -norm on 
the image, we can produce unique stable solutions ITTITl . Unfortunately, these 
methods tend to produce very smooth solutions that are inconsistent with the fo- 
cal model described above. Many nonlinear algorithms have been proposed that 
attempt to avoid this oversmoothing problem. While they have met with some 
success, the cost functions required to achieve more focal solutions are usually 
highly nonconvex and computation times can be very high, e.g. HCH. 

The model-based methods assume a specific parametric form for the sources. 
By far the most widely used models in MEG are multiple current dipoles Eiini. 
These assume that the neural sources are relatively small in number and each 
sufficiently focal that they can be represented by a few equivalent current dipoles 
with unknown locations and orientations. Parametric methods can be extended 
to model the temporal correlation expected in the solutions through fitting the 
multiple dipole model to the entire data set and estimating the time course 
for each estimated dipole location. As with the nonlinear imaging methods, the 
cost functions are nonconvex. Signal subspace based methods such as MUSIG 
or RAP-MUSIG | f7l8IH| can be used to rapidly locate the sources in a sequential 
fashion and avoid the problem of trapping in local minima. 

The equivalent current dipole model is directly interpretable as a current 
element restricted to the cortical surface. As discussed in [101, the dipole may 
also represent locally distributed sources that are not necessarily restricted to 
a single point. However, one of the perceived key limitations is that these dis- 
tributed sources may not be adequately represented by the dipole model. This 
problem was one of the prime motivations for the development of the imaging 
approaches. An alternative solution is to remain within the model-based frame- 
work but to broaden the model to allow parametric representations of distributed 
sources. The multipolar expansion provides a natural framework for generating 
these models. The multipolar expansions are formed using a Taylor series repre- 
sentation of the magnetic field equations. If the expansion point is chosen near 
the center of a distributed source, then the contribution of higher order terms 
will drop off rapidly as the distance from the source to the sensor increases. 
Using this framework we expand the set of sources to include magnetic dipoles 
and first order multipoles. These sources are able to represent the field from a 
distributed source more accurately than is the current dipole. While the idea of 
using multipolar expansions in MEG source modeling is not new, the approach 
has generally seen only limited used in magnetocardiography, e.g. jtil 1 

The parameters of the estimated higher-order multipolar terms are not eas- 
ily related to the actual physiological processes that produce the MEG signals. 
We describe here a two-stage procedure in which we first estimate the locations 
and parameters of the multiple multipoles, then relate each of the multipoles to 
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equivalent cortical sources. The method described here for estimating the loca- 
tion and moment parameters of these multipolar representations is an extension 
of the RAP-MUSIC method developed in 0 for localizing current dipoles. The 
algorithm recursively builds a model for the current source configuration by first 
testing for the presence of point current dipoles, then magnetic dipoles, and fi- 
nally first order multipoles. In this way the model order and complexity is gradu- 
ally increased until the combined estimated sources adequately explain the data. 

In the cortical re-mapping stage, we find regions of cortex in the vicinity 
of the parametric source on which we fit current distributions consistent with 
the fields associated with each estimated multipole. The final result is then a 
dynamic image of current activity mapped onto a tessellated representation of 
the cortex which reveals the time varying behavior at the various locations on 
the cerebral cortex activated during a particular experiment. 



2 Multipolar Source Modeling 

2.1 Multipolar Expansions 

The relationship between the measured magnetic field and the current sources 
is determined by the quasistatic form of Maxwell’s equations. In the special 
case in which the head is modeled as a set of concentric nested spheres, each 
with uniform and isotropic conductivity, there is a simple integral equation that 
relates the external magnetic field to the current sources. We use this result 
to derive the multipolar expansion. We include details only for the case where 
measurements are made of the radial component of the magnetic field. They 
extend directly both to the case of non-radial magnetic field measurements and 
to measureme collected using 

an EEG syste 




Fig. 1. Primary neural activity of current density j^(r') at location r' inside a closed 
conducting volume generates an external magnetic field at location r as detected by a 
magnetometer with radial orientation r/r, to yield the scalar magnetic measurement 
br(r). We develop a multipolar expansion for sources in a small region, G, using a 
Taylor series for local displacement x 
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A truncated multipolar expansion will be used to represent the measured 
magnetic field for the case of a current source restricted to a relatively small 
volume, G as illustrated in Fig.^ As the extent of the source grows, more terms 
are required in the expansion to adequately represent the external magnetic field. 
In the following we will develop expressions for the special cases of (i) point 
sources that are exactly represented as point current dipoles, (ii) highly focal 
sources that can be represented by a magnetic dipole model, and (iii) locally 
distributed sources that can be represented by a first-order multipole model. 

The external magnetic field is generated by the sum of the primary neural ac- 
tivity, designated by the current density vector and the volume or return 

currents resulting from the electric field produced by the current source. It is the 
primary currents that are the sources of interest in MEG inverse problems 0. 
The contribution of the volume currents to the external field must be accounted 
for but the currents themselves are of little interest. In the special case treated 
here of radial measurements for sources confined to a spherical volume, the vol- 
ume currents do not contribute to the measured field, and the radial component 
br{r) of the magnetic field b{r) at location r is given by direct extension of the 
well known Biot-Savart equation: 



br{r) = 




r go f w f M(r') 

r 47t 7(5 d{r,r') Jg d{r,r') 



where d{r, r') = r — r' is the distance vector between the two arguments, 
d{r,r') = ||r — r'|| the corresponding scalar distance, and G is any volume 
containing the source. For the final equality, we define the magnetic moment 
density or magnetization as M(r') = r' x j^{r') (e.g., 0(eq- 5.53)). 

The multipolar representation is found using the Taylor series expansion of 
a scalar function 



n— 0 

applied to the distance d{r,r'), where V represents the gradient with respect 
to r. Using the equalities Wr = I (where I is the identity matrix), Vr" = 
V(r • r)"/^ = nr^~‘^r, and Vd{r,r')^ = — V'd(r,r')" = nd{r,r')^~^d{r,r') 
(where V' is w.r.t. the primed variable), yields the expansion about r': 

d{r,r'-\-x)~^ = d{r,r') ^ -\-3d{r,r') ^ {x ■ d{r,r')) -\- (3) 



To produce the multipolar expansion, we use m to expand m about ri, the 
centroid of the region to which the primary source is confined (cf. jO] (eq. 9.3.18)): 



br(r) = 



Mo 



47t r||r — rj|P 
M{ri -b x) 



3M{ri -b a:) 



X ■ {r — ri) -b . . . I dx. 



(4) 



Ik-nP 



MEG Source Imaging Using Multipolar Expansions 



19 



If ||a;|| <C ||r — rj||, then we may generally neglect the higher order terms. 
From Fig. ^ we can see that this inequality is equivalent to the extent of the 
distributed source being much smaller than the distance from the source to the 
sensor. We now consider the three types of sources that will be used to represent 
regions of increasing size in our model of cortical activation. 

Point Current Dipole: We consider first the case where the current source 
is confined to a single point, i.e. j^{r') = S{r' — ri)q where q is the current dipole 
moment and S is the Dirac delta functional. Substitution into 0) produces the 
result 



br{r) 



Mo r xri ^ 

Att rd{r,ri)^ ’ 



( 5 ) 



since all terms but the first are identically zero. This is the standard current 
dipole model that is widely used in the MEG and EEG literature. The source is 
characterized by the location rj and moment q. 

Magnetic Dipole: We now consider the effect of allowing the extent of the 
source to grow so that it can no longer be represented using a delta function. 
We let the extent of the source be sufficiently small that the second and higher 
order terms are negligible, and we rewrite the first term of as 



br{r) = 



Mo 



47 t \rd{r,ri)^ 



J M{ri+x)dx^ = 



Mo 



47t rd{r, ri)^ 



m, 



( 6 ) 



where we define m to be the magnetic dipole moment 



m= {ri + x) X j^{ri + x)dx. (7) 

JG 

Thus we can characterize the magnetic dipole with the moment vector m and 
location ri. In o we can define q{ri) = jQj^(ri + x)dx to be the equivalent 
current dipole moment and rh{ri) = xxjP{ri+x)dx to be the local magnetic 
dipole moment, i.e. a local “spin” of the source about a central point. We can 
therefore express the magnetic moment as m{ri) = ri x q{ri) +rh{ri), and the 
magnetic dipole includes the equivalent current dipole as the special case. 

First-Order Multipole: Now we consider the final case where the source is 
sufficiently large that the first two terms in the Taylor series should be included. 
In this case we can rewrite 0) as 



br{r) 



jJ-o r / S{Q{ri) ■ d{r,ri)) 
47rrd(r,rj)3 d{r^riY 



( 8 ) 



where Q{ri) is the magnetic quadrupolar term defined as the matrix formed from 
the tensor product 




M{ri + x)xdx. 



( 9 ) 
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We can rewrite (0 using the Kronecker product a® 6, defined as the concate- 
nation of the product of each element of a with the vector b, and the operator 
vec(A), defined as the concatenation of the columns of a matrix into a vector: 



'>-(’•) = to 



rd(r, 



3(d(r, ri) 0 5 
rd(r, riY 



vec{Q{ri)) 



( 10 ) 



We therefore characterize the first-order multipole using the combination of the 
magnetic dipole moment vector m, the nine magnetic quadrupolar terms in 
Q{ri), and the location rj. 

We could obviously continue to expand the multipolar series to higher-order 
terms. In theory, focal sources could exist such that the leading terms of the ex- 
pansion integrate to zero, leaving only the higher-order terms. In practice, how- 
ever, our assumption that the primary activity is modeled as elemental dipoles 
restricted to the cortex minimizes our need to consider these higher terms. The 
spatial distance from the cortex to the sensors, the relative smoothness of the 
cortical surface, and the relatively high noise levels suppress these higher-order 
moments in relatively focal regions of activation. 



2.2 The Forward Problem 

The multipolar development above includes three models of assumed increasing 
spatial extent, each of which produces a radial magnetic field measurement which 
is a nonlinear function of the location (i.e. the center of expansion for the Taylor 
series) and a linear function of its moments. In the inverse problem, both the 
linear and nonlinear terms are assumed unknown. The decomposition into linear 
and nonlinear components for the current dipole model has previously been 
used to simplify nonlinear least squares fitting 0 and localization using signal 
subspace methods such as MUSIC PHI Since the magnetic dipole and first order 
multipole are similarly decomposed, these methods can be directly extended 
to include searches for distributed non-dipolar sources. Furthermore, as noted 
above, the expansions included here can be readily extended to the case of non- 
radial MEG and EEC measurements for the spherical head models. 

The radial magnetic field can be represented for each of the three types of 
source as the inner product of a gain vector and the vector of linear parameters, 
6(r) = g(r, ri) ■ 1. The separation of nonlinear and linear parameters are clearly 
shown in ( 0 , ®, and (imii . We assume an MEG array of m sensors sampling 
the magnetic field of the source. By concatenating these measurements into a 
vector, we can represent the “forward field” of the source as 

[6(ri) . . . 6(r„,)]’^ = [g(ri,rj), . . . , g(r„,n)]^« = G{ri)l (II) 

where G{ri) is the “gain matrix” which accounts for all possible orientations of 
the source at ri 0. The forward model for an arbitrary combination of sources 
can be found by linear superposition. To extend the forward model to include 
temporal variations, we adopt the assumption that there are a finite combination 
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of sources that are active. The solution of the inverse involves estimating the 
location, moment parameters and time courses of each of these sources. 

It is possible for two sources to be synchronous. For example, bilateral ac- 
tivation in sensory or auditory cortex could be represented by two synchronous 
focal dipoles, one in each hemisphere. To account for this possibility in the sub- 
space methods described below, we adopt an independent topography model 0 in 
which each topography consists of one or more elementary sources, all of which 
have identical time courses. For a p-source topography sampled over m sensors 
and n time instances, we may express the resulting m x n spatiotemporal data 
matrix as 



6(ri,ti) ■ ■ • h{ri,tn) 






li{ti) ■ 


■ h{tn) 




= [G(nJ,. 


• ,G(nJ] 






_h{v ••• h{v ^n) _ 






_lp(ti) ■ 


lp(tn^ _ 



where lj(tk) represents the linear parameters for the jth source sampled at the 
kth time instance. Since all of these sources have the same time course, the 
matrix of linear parameters is rank one and may be decomposed using an SVD 
into the outer product of a single pair of singular vectors u and v scaled by the 
singular value a, 



T 

uav = 



l\{ti) • ■ ■ l\{tn) 






(13) 



Defining the scalar time series of this independent topography to be s = av, 
we may rewrite (E» as 



[G(nJ • • • G{ri^)] u [s(ti), • • • , s(t„)] = a(pi,Mi)s^. 



(14) 



The p-source topography vector is a function of the set pi of p source locations. 
Pi = i = 1, . . . ,p and the unit norm vector Ui from (1131) . The vector Ui 

may be viewed as a generalization of an “orientation” vector by concatenating 
all of the linear source parameters and scaling by its length. 



Ui = 







(15) 



To complete the full model for the observed MEG data we simply concatenate 
the r independent topographies that make up the complete source and add noise: 



F = A{p,u)S'^ + N = [a{pi,ui), a{pr,Ur)] 



AT, 



(16) 



where each m x 1 column vector a{pi,Ui) = G{pi)ui represents the ith. inde- 
pendent topography corresponding to the fth time series . The set p comprises 
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the r sets of source locations {pi} and the set u the corresponding topogra- 
phy orientations {ui}. Each topography may comprise one or more multipolar 
sources, but only a single time series. By our definition of independent topogra- 
phies, the matrix of time series S is rank r, and the matrix of topographies A 
is assumed to be unambiguous and also of rank r. The matrix TV represents ad- 
ditive random noise, which we will assume to be spatially and temporally white 
with zero mean and variance a^. 



2.3 Signal Subspace 

Under the assumption that the signal is uncorrelated with the noise, the auto- 
correlation matrix for the m x n spatiotemporal data in is 

R= E{FF^} = A{S^S)A'^ + nall. (17) 



The autocorrelation matrix can expressed using an eigendecomposition as: 



R=[^s\^e] 



As 0 
0 As 






(18) 



where the diagonal matrix Ag = A + na^I represents the r largest “signal plus 
noise” eigenvalues and their corresponding eigenvectors form the matrix ■ The 
diagonal matrix Ae = na^I represents the smallest “noise” eigenvalues and their 
corresponding eigenvectors form the matrix Fg. 

We refer to as spanning the signal subspace and to Fg as spanning the 
noise-only subspace. In practice, we estimate the signal and noise subspace 
basis vectors by a eigendecomposition of the outer product FF"’" or an SVD of 
F. We denote the estimate of as ^g. 



3 Source Localization 
3.1 RAP-MUSIC 

The RAP-MUSIC algorithm is described in detail in 0. Here we briefly review 
the method and describe its application in combination with the multipolar 
models developed above. The first source is found at the location which produces 
the global maximum of the metric 

Pi = argmax(su6corr(G(p), #s)i)- (19) 

The function subcorr{-) represents the “subspace correlations” between the two 
matrices. The subspace correlations are the ordered set of cosines of the principal 
angles as defined in P). The first subspace correlation, subcorr(-)i, corresponds 
to the cosine of the smallest principal angle and will be unity if the two matrices 
have at least a one-dimensional subspace in common. If we define Ua to be 
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the orthogonal matrix spanning the same space as G{p), then the square of the 
subspace correlations are found as the eigenvalues of the matrix 

UISsSJUg. ( 20 ) 

By maximizing the first subspace correlation in (El, we identify the source 
location and corresponding gain matrix that has the smallest principal angle with 
respect to the signal subspace. Since we only need to search over the location 
parameter, a nearly exhaustive search over a relatively dense three-dimensional 
grid within the brain volume can be performed relatively quickly for any of the 
three source models of the previous section. For the case of synchronous sources, 
the dimensionality of the search increases by at least a factor of two and the 
computational cost rises dramatically, but the procedure nonetheless proceeds 
directly. 

To complete the first independent topography model, we need the corre- 
sponding source orientation vector, which is a simple linear transformation of 
the eigenvector of 1121 III corresponding to the maximum eigenvalue [,31/] . The re- 
sulting estimates yield the first estimated independent topography, a{pi,ui) = 
G{pi)ui. 

For each of the remaining k = 1, 2, . . . , r RAP-MUSIC recursions, the non- 
linear source location parameters are found as 

Pk = arg max (^sufocorr ^ (21) 

where Ak-i = [a(pi, Mi), . . . , a{pk-i,Uk-i)] represents the composite indepen- 
dent topography matrix, and the projection operator ^ is computed as 

( 22 ) 

where Af._i = {Ak_iAk-i)~^ Ak_i is the pseudoinverse of Ak-i- Through this 
recursion, we sequentially remove the components of the signal subspace that 
can be explained by the sources that have already been found. We then search 
the remaining signal subspace for additional sources. 

At each iteration the source location set p in m may represent one or more 
multipolar sources. To find the simplest sources consistent with the data, we 
begin the search with the current dipole model, then progress through the mag- 
netic and first-order multipole models. The decision to increase the complexity 
of the model is based on a minimum correlation threshold. In this paper, we 
will restrict the search to one-source models only, halting the recursion when the 
first-order multipole maximum subspace correlation drops too low. Examples of 
different correlation thresholds are given in the Monte Carlo simulations in the 
next section. Extensions to multiple synchronous dipolar sources are discussed 
in Ej, with obvious extensions to multiple multipolar sources. 

With all sources in the data identified and their independent topographies 
represented in the final topography matrix A^, estimates of the corresponding 

^ /V T --v T 

time series are readily found as S = (Ar Ar)~^Ar F or in some regularized 
form thereof. 
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3.2 Mapping Parametric Sources onto Cerebral Cortex 

The linear parameters of the multipolar model computed using the RAP-MUSIC 
search are estimates of the moments formed by integrating the primary current 
sources as defined in (P). When the sources are confined to cortex, which we 
can represent as a continuous surface, the moments are generated as integrals 
over a surface patch containing the sources. For the single-source topographies 
considered here, we assume that each source represents the activation of a single 
contiguous cortical patch. The final step in our parametric imaging method 
is then to relate the multipolar moments back to a plausible distribution on 
the cortical surface which consists of a set of patches of activation consistent 
with the estimated moments. Fitting the moments to sources on the cortex 
involves estimation of both the surface patch and the current distribution on that 
patch. As with the original MEG inverse problem, the solutions are ambiguous. 
However, under the assumption that each surface patch is contiguous and in the 
vicinity of the estimated multipole, the degree of ambiguity is greatly reduced. 

To perform the final stage of the multipolar imaging method we use a finely 
tesselated cortical surface extracted from an MRI volume. In fitting the multi- 
polar sources to the cortex, we allow a current element at the vertex of each 
triangular patch on the surface, with an orientation derived as a weighted sum 
of the triangular normals adjacent to the vertex. To fit a specific multipolar 
source with topography a{pi,Ui) to the cortical surface, we begin by creating a 
list of candidate locations on the cortex in the vicinity of the source location. 
For each candidate point, we test the subspace correlation between the point 
and the topography. If the point with the highest correlation meets a minimum 
threshold (e.g. 98%), we designate it as the corresponding re-mapped cortical 
source for that topography and halt. Otherwise, we add adjacent points to each 
of the candidate points to form small distributed patches and continue to swell 
each candidate point until we find a patch that meets the threshold. 

This approach will generate a patch of minimal size consistent with the iden- 
tified topography. We may continue to swell the patch and find additional pos- 
sible sources consistent with the topography, a consequence of the ambiguity in 
the inverse problem rather than a specific limitation of the method described. 
Currently we grow the patch by adding a ring of triangles around the elements 
already in the patch. A more sophisticated approach based on testing a number 
of possible candidates to add to each patch may prove more robust. Alterna- 
tively, we could adopt a stochastic model for the mapping between the estimated 
multipolar parameters and the corresponding cortical activation. This approach 
could readily incorporate the activation models described in our previous work 
on Bayesian MEG imaging HU- 

4 Monte Carlo Simulations 

In the first simulation we used the tesselated human cortex shown in Fig. 0 
which contains approximately 230,000 triangles. Radial magnetic fields sensors 
and a spherical forward model were used in the generation of the simulated 
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Fig. 2. (a) The ground truth for the simulation study showing mappings of the three 
sources onto the cortical surface; (b) Reconstruction of the cortical activity using the 
multipolar method; (c) Reconstruction of the data from time t = 10 using a regularized 
minimum L 2 norm method 



data and in the inverse method. Three distributed sources were created on the 
cortical surface, also shown in Fig.|21 The three sources were given overlapping 
independent time courses as shown in Fig. 0 The forward magnetic field was 
measured by a simulated array of 104 magnetometers spaced approximately 
uniformly on the upper hemisphere at a radius of 12 cm. Zero mean Gaussian 
white noise was added to the sensor data at a ratio of 100:1 signal to noise 
variance. 

Although analysis of the singular value spectrum of this high SNR data 
clearly revealed a rank of three, we overspecified the rank to ten to demonstrate 
robustness to selecting too great a rank. We set the acceptance threshold for 
correlation at 98%. The RAP-MUSIC algorithm was first run with the simplest 
of the source topographies, the current dipole for which a maximum cor- 
relation of 99.9% was found. On the second recursion, the correlation with the 
dipole model dropped below the threshold of 98%. We therefore increased the 
complexity of the model to the magnetic dipole (|SI) and achieved a correlation 
of 98.3%. The third recursion was below the threshold for the magnetic dipole, 
so we increased the model to a first-order multipole (mu to obtain a correlation 
of 99.9%. On the fourth recursion, the correlation plummeted to 62% for the 
multipole and the recursion was halted at three sources. The three topographies 
found were then used in a least-squares fit to determine the time series of the 
three sources. Fig. 01 

We mapped the three topographies into the minimal cortical source regions, 
also shown in Fig. El For comparison we also include a regularized minimum 
L 2 -norm solution fitted at one of the intermediate time slices, for which the 
spatial distribution and time series are also shown in Fig. Eland Fig. 01 We see 
that although the re-mapped topographies obtained using the multipolar method 
are not identical to the “ground truth” they are indeed similar. In comparison. 
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Fig. 3. Time courses for the three sources (a) ground truth; (b) time courses estimated 
using the multipolar method; (c) time courses averaged over each of the true activation 
areas computed from the minimum norm solutions. In this high SNR example, the time 
series reconstruction in (b) is nearly perfect, while (c) exhibits high noise sensitivity 



the minimum norm solution exhibits substantial source blurring due to the low 
resolution of the linear inverse methods. 

As discussed above, the multipolar source center is assumed to be near the 
distributed cortical source. We tested this assumption in a Monte Carlo simu- 
lation of 10,800 distributed sources over a range of noise levels. We also tested 
the effects of the correlation threshold parameter used in the RAP-MUSIC algo- 
rithm to accept a model. Each source was centered randomly on the upper half 
of the brain surface in Fig. 0 With a 50% probability, each source was either 
a “monophasic” contiguous patch of 200 mm^ or a “biphasic” patch of two 200 
mm^ patches centered about 8 mm apart (about 50% overlap) and of opposite 
polarity. Each Monte Carlo realization simulated three such sources with over- 
lapping non-orthogonal time series. No attempt was made to force the three 
sources to be widely separated, so that source overlaps were possible in any sin- 
gle realization. A hemispherical array of 138 magnetometers was simulated a few 
centimeters above the cortical surface. Although the true signal subspace rank 
was three, we intentionally selected a larger rank of five for each realization. 

Twelve cases of SNR and correlation threshold were tested, with 300 Monte 
Carlo realizations per case, for a total number of 10,800 sources. For each simu- 
lated source, we determined the geometric centroid of the patch. We then com- 
puted the distance from this centroid to the multipolar source location nearest 
to the source as an indication of the accuracy of the estimate. However, we note 
that the multipole that gives the best fit to a particular distributed cortical 
source does not necessarily lie on the cortex. 

The global statistics presented in Tabled show that the current dipolar loca- 
tions are in general closer to the patch centroids than the non-dipolar locations. 
The 20 dB SNR case represents a mostly noiseless signal to allow observations of 
the modeling effects. Even though the sources were spatially large, the majority 
of the monophasic and some of the biphasic sources were modeled quite well 
as dipoles, even at the 99% correlation level. The first-order multipole model 
accounted for the remainder. The 3 dB SNR case represents a rather severe case 
of 67% signal variance to 33% noise variance. At 99% correlation, most sources 
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Table 1. Monte Carlo Study. SNR is ten times the log base-ten of the ratio of the 
total signal variance to the total noise variance, both valnes measured at the array of 
sensors. Correlation threshold is the minimum subspace correlation value for the model 
to be accepted. The first row summarizes the results over all trials for a total of 10,800 
sources localized. Each additional row represents a different Monte Carlo trial of 300 
realizations and 900 sources. The sources are described in the text. The mean and 
standard deviation (in mm) for the solution distances are given for the ECD model 
and the non-ECD (magnetic dipoles and first-order multipoles combined). The hnal 
column gives the number not localized at the given threshold. 



SNR 


Correlation 


Number 


Mean, 


Non- 


Mean, 


Missing 


(dB) 


Threshold 


of ECDs 


Std.Dev 


ECDs 


Std.Dev 


Sources 


ALL 


ALL 


6659 


(5.34, 4.56) 


2977 


(7.06, 5.98) 


1164 


3 


0.94 


643 


(6.51, 5.38) 


183 


(6.05, 5.81) 


74 


3 


0.96 


565 


(5.69, 4.03) 


215 


(6.28, 6.58) 


120 


3 


0.98 


378 


(5.17, 3.88) 


282 


(7.25, 6.59) 


240 


3 


0.99 


65 


(6.32, 3.98) 


220 


(14.58, 8.37) 


615 


10 


0.94 


698 


(6.35, 5.81) 


198 


(4.85, 6.70) 


4 


10 


0.96 


641 


(5.68, 4.86) 


254 


(4.43, 5.55) 


5 


10 


0.98 


575 


(4.47, 3.15) 


302 


(4.20, 3.41) 


23 


10 


0.99 


489 


(3.99, 2.86) 


332 


(4.41, 3.57) 


79 


20 


0.94 


737 


(6.03, 5.60) 


163 


(4.99, 6.72) 


0 


20 


0.96 


702 


(4.97, 4.17) 


198 


(4.73, 5.27) 


0 


20 


0.98 


625 


(4.69, 3.94) 


275 


(3.96, 3.56) 


0 


20 


0.99 


541 


(4.26, 3.38) 


355 


(3.96, 2.97) 


4 



are lost in the noise, but at the lower correlation thresholds we see the majority 
of sources still detected quite well as either dipoles or multipoles. Although we 
intentionally set too large a rank for the signal subspace, we also note the impor- 
tant fact that no spurious sources were found, i.e. we never saw more than three 
sources. As we might expect, the effect of lowering the correlation threshold is 
to allow more sources to be detected, but at the cost of greater mean distance 
between the source locations and the patch centroids. 



5 Conclusion 

We have described an algorithm for computing estimates of cortical current 
activity from MEG data. The method exploits the low dimensionality of para- 
metric multipolar models to estimate the locations of equivalent representations 
of the current sources. These representations are then mapped onto a tessellated 
representation of the cortical surface resulting in a spatiotemporal estimate of 
cortical activity. Monte Carlo simulations indicate that the potential of this 
method to extend the parametric approach to the representation of more dis- 
tributed sources. The resulting images avoid the very low resolution encountered 
using minimum norm methods and the high computational costs of the other 
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nonlinear imaging methods. Planned studies include experimental phantoms and 
human studies of self paced and visually cued motor activation. 
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Abstract. The problem of reconstructing a binary image (usually an 
image in the plane and not necessarily on a Gartesian grid) from a few 
projections translates into the problem of solving a system of equations 
which is very underdetermined and leads in general to a large class of 
solutions. It is desirable to limit the class of possible solutions, by using 
appropriate prior information, to only those which are reasonably typical 
of the class of images which contains the unknown image that we wish to 
reconstruct. One may indeed pose the following hypothesis: if the image 
is a typical member of a class of images having a certain distribution, 
then by using this information we can limit the class of possible solu- 
tions to only those which are close to the given unknown image. This 
hypothesis is experimentally validated for the specific case of a class of 
binary images representing cardiac cross-sections, where the probability 
of the occurrence of a particular image of the class is determined by a 
Gibbs distribution and reconstruction is to be done from the three noisy 
projections. 



1 Introduction 

The subject matter of this paper is the recovery of binary images from their 
projections. A binary image is a rectangular array of pixels, each one of which 
is either black or white. In the case of cardiac angiography, we can represent a 
section through the heart as a binary image in which white is assigned to those 
pixels which contain contrast material. A projection of a binary image is defined 
as a data set, which for every line (in a set of parallel lines, each of which goes 
through the center of every pixel which it intersects at all) tells us, at least 
approximately, how many white pixels are intersected by that line. According 
to this definition there can be only four projections: one horizontal, one vertical 
and two diagonal. There exist more general definitions of projections in the 
literature P3? but it is typical for many applications that only a few projections 
are available m- 

The problem of binary tomography is the recovery of a binary image from 
its projections. This problem can be represented by a system of equations which 
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is very underdetermined and leads typically to a large class of solutions. It is 
desirable to reduce the class of possible solutions to only those which are reason- 
ably “close” to the (unknown) image which gave rise to the measurement data. 
Appropriate prior information on the image may be useful for this task In 
addition to the inherent information in binary tomography that there are only 
two possible values, Gibbs priors I5IBI describing the local behavior/character of 
the image can also provide useful information. We pose the hypothesis that, for 
certain Gibbs distributions, knowledge that the image is a random element from 
the distribution is sufficient for limiting the class of possible solutions to only 
those which are close to the (unknown) image which gave rise to the measure- 
ment data. 

Binary images can be described in many applications by the following simpli- 
fied characterization: a set of objects - “white” regions - are located in a “black” 
background. (We adopt the convention that 1 represents white and 0 represents 
black.) This can be easily translated into Gibbs distributions by using a set of 
configurations of neighboring image elements and assigning a value (which is an 
indicator of the likelihood of occurrence) to each of these configurations. 

One type of test presented in this paper is motivated by the task of recon- 
structing semiconductor surface layers from a few projections. Fishburn et al. Pj 
designed three test phantoms for assessing the suitability of binary tomography 
for that task. These phantoms have been recently used in the binary tomography 
literature by the several other researchers (see, e.g., ^). The common experi- 
ence reported by these researchers is that knowing the horizontal, vertical and 
one diagonal projection is not sufficient for exact recovery of such phantoms. 
However, it is shown in 0 that an algorithm, which makes use of an appropriate 
Gibbs prior, correctly recovers the test phantoms of 0 from three projections. 

The following section introduces Gibbs distributions and discusses their def- 
inition using a look-up table. A reconstruction algorithm based on two given 
perfect projections and a Gibbs prior is presented in the third section, where it 
is also illustrated for the phantoms of jSj that the algorithm (while achieving its 
mathematical aim) fails to recover the original object. Since three projections are 
sufficient to recover these test phantoms based on semiconductor surface layers, 
it appears possible that three projections would also be sufficient for the recovery 
of cardiac cross-sections. An algorithm to do this is presented in Section 4; this 
algorithm does not assume that the projections are noiseless. Its performance is 
investigated in Section 5, where the influence of noise is also demonstrated. The 
final section presents our conclusions. 

2 Gibbs Distributions Associated with Binary Images 

Local properties of a given binary image u defined on H pixels (each pixel is 
indexed by an integer h, 1 < h < H , and uj{h) is either black or white) can be 
characterized by a Gibbs distribution of the form 

7T(w) = 4 , 

Zj 



( 1 ) 
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where n(uj) is the probability of occurrence of the image w, Z is the normalizing 
factor (which insures that 7T is a probability density function; i.e. that the sum 
of n{iS) over all possible binary images is 1), /3 is a parameter defining the 
“peakedness” of the Gibbs distribution (this is one of the parameters controlling 
the appearance of the typical images), and //i(w) is the “local energy function” 
for the pixel indexed hy h, 1 < h < H . The local energy function is defined in 
such a way that it encourages certain local configurations, such as uniform white 
or black clusters of pixels and configurations forming edges or corners. Each of 
these configurations can be encouraged to a different extent by assigning to 
them a specific value. In this paper we have adopted the convention that the 
local energy function at a pixel depends only on its own color and those of its 
eight neighbors. Thus, the color of a particular pixel influences the value of the 
local energy function of only itself and its eight neighbors. 

Appropriate definition of the local energy function plays an important role 
in successful image recovery. The definition should reflect the characteristics of 
a typical image of the particular application area. There are many possible ways 
of defining the local energy function. One of them is to use a look-up table 
which contains a value for each possible configuration. (In our case, there are 
512 possible configurations.) Given an ensemble of typical images for a particular 
application (a training set), the look-up table can be created by counting the 
number of times each particular configuration appears in the images. Then the 
h{uj) of ® is defined as In(g-l-l), where q is the value in the look-up table of the 
local configuration in the image oj at the pixel h. The usefulness of the resulting 
prior depends on the size of the training set (the larger, the better) and on how 
representative the images in the training set are for the application area. 



3 Biplane Tomography: Preliminary Experiments 

Ryser showed in the 1950’s 0 that if one matrix of O’s and I’s has the same 
row and column sums as another such matrix then the first matrix can be trans- 
formed into the second by a finite sequence of simple switching operations each 
of which changes two I’s to O’s and two O’s to I’s and leaves the row and column 
sums unaltered. This can be regarded as a result of binary tomography, since 
matrices of O’s and I’s can be viewed as binary images; two matrices that have 
the same row and column sums correspond to two binary images which have 
the same horizontal and vertical projections. We refer to such images as being 
tomographically equivalent. The simple switching operation described above will 
be referred to as a reetangular 4~switeh. 

Let C be any tomographic equivalence class of binary images. Gonsider the 
graph whose vertices are in 1 - 1 correspondence with the binary images in C, 
in which two vertices are adjacent if and only if the image corresponding to 
one vertex can be obtained from the image corresponding to the other by a 
single rectangular 4-switch. We will call this the Ryser graph of the tomographic 
equivalence class C. 
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The Ryser graph is a finite graph since each tomographic equivalence class 
is finite. In view of Ryser’s result |S|, the Ryser graph is connected. 

We now give an application of the Ryser graph. Let V be the set of all binary 
images. Consider the following problem: Given a binary image u G V, find an 
image in ui’s tomographic equivalence class for which II (uj) of Q has a relatively 
high value. (Ideally, we would like to find an image that maximizes II (uj), but 
we do not expect to always achieve this.) 

Kong and Herman describe (two versions of) an iterative stochastic al- 
gorithm to do this. The algorithm is a typical instance of a class of algorithms 
known in the literature as Metropolis algorithms mg. Since such algorithms are 
often time consuming, jOj devotes a considerable amount of space to the achieve- 
ment of a relatively efficient implementation. The essential idea is to first find a 
single binary image which satisfies the two given projections and then iteratively 
investigate the effect on II (oj) of making a random rectangular 4-switch. 

Roughly speaking, a single step in the Metropolis procedure starts with “ran- 
domly picking” a possible rectangular 4-switch for the current image uj\ . Let UI 2 
be the image that is obtained by performing this rectangular 4-switch on loi. 
Let p be the ratio of n{u> 2 ) to 7T(tui). The single step of the iterative procedure 
is completed by replacing oji by 0 J 2 if p is greater than 1, and replacing by 
LO 2 with probability p (and hence retaining ioi with probability 1 — p) if p is less 
than 1. As explained in [9], properties of Ryser graphs and of the Metropolis 
algorithms guarantee that the procedure just described will produce images uj 
with relatively high values of II (oj); for a precise statement (as well as for a 
discussion of implement ational concerns), see [9]. 

In order to test out our ideas on reconstructions from two projections, we 
implemented the algorithms described in 0 and applied them to the binary 
images in 0 representing semiconductor surface layers. (For these experiments, 
the lookup-table was created using the three phantoms of |^.) For all three 
phantoms (these are shown on the left of Figs. 0 0 and 0 respectively), the 
algorithms of fij performed “too well” in the sense that the reconstructed images 
(these are shown on the right of Figs. Q], 0 and 0 respectively) have a higher 
value of n{uj) than the originals. One might say after looking at these figures 
that the reconstructions are versions of the original binary images in which the 
boundaries have been smoothed. 

As a result of these preliminary experiments combined with the fact that 
all three phantoms of [3] were perfectly recovered when Gibbs priors were com- 
bined with three perfect projections [7], we decided to investigate the efficacy of 
triplane rather than biplane cardio-angiography. 
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Fig. 1. Phantom 1 (left) and its reconstrnction (right) based on Ryser graphs and a 
Metropolis algorithm from perfect horizontal and vertical projections; . and 1 represent 
the values zero and 1 (respectively) in the phantom and at correctly reconstructed loca- 
tions; - and * represent incorrectly reconstructed values of zero and one (respectively); 
the total number of incorrectly reconstructed pixels is 12 
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Fig. 2. Phantom 2 (left) and its reconstruction (right) based on Ryser graphs and a 
Metropolis algorithm from perfect horizontal and vertical projections; . and 1 represent 
the values zero and 1 (respectively) in the phantom and at correctly reconstructed loca- 
tions; - and * represent incorrectly reconstructed values of zero and one (respectively); 
the total number of incorrectly reconstructed pixels is 8 
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Fig. 3. Phantom 3 (left) and its reconstrnction (right) based on Ryser graphs and a 
Metropolis algorithm from perfect horizontal and vertical projections; . and 1 represent 
the values zero and 1 (respectively) in the phantom and at correctly reconstructed loca- 
tions; - and * represent incorrectly reconstructed values of zero and one (respectively); 
the total number of incorrectly reconstructed pixels is 90 



4 A Reconstruction Algorithm for Three Noisy 
Projections 



Assume that our data consist of estimates of three (horizontal, vertical and one 
diagonal) projections of an image, which we believe to be a random sample from a 
known Gibbs distribution. Then a reconstruction algorithm should find an image 
which is not only consistent with the data, but which is also a typical sample 
from the known Gibbs distribution. We use a modified Metropolis algorithm 
in which the search for a likely image is altered to take also into account the 
effect of replacing wi by u >2 on the consistency with the given projection data. 
Roughly speaking, if the data inconsistency is increased or decreased, then the 
change is discouraged or encouraged, respectively. The relative influence of the 
data inconsistency is controlled by a parameter a (a > 0). 

To be exact, the Metropolis algorithm is modified as follows. First, since it 
may no longer be possible to find a binary image which satisfies our (noisy) 
projection data exactly, we do not attempt to start the iterative process with 
such an image. (In the experiments which are reported below, the initial image 
is always totally black.) Second, in the iterative step, the current image uji is 
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changed into W2 by randomly picking a single pixel hi and changing its color. 
The role of p is replaced by 

pi _ g/3({EheN(hi)b'>(“2)-/fe(i^i)]} - a{Fhj(w2)-i^hi(0Ji)}) ^ ^2) 

where N{hi) is the set of at most nine pixels consisting of hi and its neighbors 
and 



Fh^{uj) = \dhj^{uj) - rrihil, ( 3 ) 

3 

dfiA^^) = ( 4 ) 

3 

( 5 ) 

i=l 

where (w) is the number of white pixels in image co on the line going in the 
direction i through the pixel hi and is the value of the corresponding item in 
the given projection data. Finally, LO2 may, or may not, replace loi as determined 
by the Metropolis principle with p' defined as in (Q. To be exact, wi is replaced 
by 0J2 if p' is greater than 1 and u>i is replaced by 0J2 with probability p' (and 
hence wi is retained with probability 1 — p') if p' is less than 1 . 

Such a procedure is guided preferentially towards images which have rela- 
tively large probability, as defined by o, and are at the same time not too 
inconsistent with the projection data. The procedure is run for a “long time” 
(see below) and at its termination we select as its output that image from the 
sequence produced by it which has the maximum probability ©. 

5 Triplane Tomography: Application to Cardiac 
Angiography 

For this application, we have identified a statistical ensemble of mathematically 
described images based on cardiac cross-sectional images in These images all 
consisted of three geometrical objects (an ellipse representing the left ventricle, a 
circle representing the left atrium and the difference between two circular sectors 
representing the right ventricle) of statistically variable size, shape and location. 
By assigning white to every pixel whose center is inside one of these objects (and 
black to every other pixel) each mathematically described image gives rise to a 
binary image; we refer to such binary images as “phantoms” . (The reason why 
the binary assumption is justified is that the intended application is subtraction 
angiography in which the projection data are obtained by subtracting a pre- 
injection x-ray picture from a post-injection x-ray picture; the difference is the 
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Fig. 4. Two of the 10 phantoms from the training set 



projection data of the image containing either the injected contrast material or 
nothing.) 

Ten phantoms were randomly generated to create our training set. (Two 
of these are shown in Fig. 0) Based on them, we collected the Gibbs prior 
information, by simply counting the occurrences of each possible configuration 
of a 3 X 3 window over all the images of our training set. This produces a look-up 
table, and hence a Gibbs distribution, as explained in Section 2. 

The phantoms were defined on the square grid with height and width equal 
to 63 pixels. Thus, in our experiments, we have i?=3,969. The phantoms and 
the raysums were generated using the software SNARK93 m and the pixel size 
used was 1mm, producing 63mm x 63mm images. Using SNARK93, we added 
noise to the raysums generation, producing raysums corrupted by an additive 
noise of mean 0.0 and standard deviations (cr) equal to 0.0 (noiseless case), 0.5 
and 1.0. Since SNARK93 generates the projection data based on the geometri- 
cally described objects, even the “noiseless” data are only approximations of the 
projections of binary images of the discretized phantoms. 

In our experimental study we investigated the actual benefit of prior infor- 
mation for cardiac cross-sectional binary image reconstruction. Our testing set 
consisted of 10 phantoms (from the same ensemble as the training set, but sta- 
tistically independent), and for each phantom and each noise level (0.0, 0.5 and 
1.0) three projections were generated; horizontal (<— ), vertical (J,) and diagonal 
(,/). Since the image size was 63 x 63, for each phantom and noise level we 
produced 63 horizontal, 63 vertical and 125 diagonal raysums, adding to a to- 
tal of 251 raysums. The algorithm received as input the raysums generated by 
SNARK93 and values for a and f3 in 0. The values of a and (3 for the noiseless 
case were selected based on the Gibbs prior (look-up table), that was generated 
by scanning the images of the training set and counting the pixel configurations 
on a 3 X 3 window. Using this knowledge, we selected a = 23.0 and /3 = 0.1 for 
the experiments using noiseless raysums. The selected a and /3 values balance 
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Fig. 5. A phantom (upper left corner) and its reconstrnctions using noiseless raysums 
(upper right corner) and raysums with additive noise of mean 0 and standard deviation 
(a) of 0.5 (lower left corner) and 1.0 (lower right corner) 



the contribution of the raysums and the Gibbs prior to the image reconstruction. 
Since, in the other cases, some noise was introduced into the raysums generation, 
we selected smaller a values, a = 18.4 for a = 0.5 and a = 13.8 for a = 1.0 
(reflecting change in our confidence level on the raysums), while maintaining 
[3 = 0.1 for both cases. 

For any binary image we define its energy as the sum of the local energy func- 
tion over all pixels; i.e., In all experiments, the program outputs 

the image with the highest energy after 50,000 cycles (excluding the first 5,000 
cycles, during which the totally black starting image could still have an influence 
on the image energy). In each cycle the algorithm randomly visits 3,969 pixels 
in the image and performs the modified Metropolis step as defined in Section 4. 
Using the phantom and the output image, we computed their energy difference, 
the number of pixels for which the output has a different color from the phan- 
tom and the total difference of their projections. Another quality measurement 
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Fig. 6. A phantom (upper left corner) and its reconstructions using noiseless raysums 
(upper right corner) and raysums with additive noise of mean 0 and standard deviation 
(<t) of 0.5 (lower left corner) and 1.0 (lower right corner) 



used was the absolute difference (measured in pixels) of the areas of the objects 
representing the right ventricle, left ventricle and left atrium, in the phantom 
and in the reconstruction. In a reconstruction an “object” is defined as a compo- 
nent (a maximally connected subset) of the set of white pixels under 8-adjacency 
(two pixels are 8-adjacent if they share a corner or an edge). Two examples of 
a phantom (phantoms number 3 and 7) and the corresponding reconstructions 
using the raysums generated with the three different noise levels are shown in 
Figs. ^ and (In this initial work we concentrated on investigating the possi- 
bility of accurate reconstructions from triplane data and paid no attention to 
the efficiency of implementation. Because of this the total computer time for the 
50,000 cycles is 5 hours on an Sun ultra 10 300MHz.) 

As can be seen in Tabled all ten phantoms were reconstructed successfully 
for all three noise-levels. The table shows the results for the ten phantoms of 
the testing set with the three different noise levels for each one. The energy 
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Table 1. Table reporting the phantom numbers and energy values (Energy), and the 
energy difference (DEnr, where DEnr = Eaiphantom - ^arreconstrucUon), pixel difference 
(DPi) and projection difference (DPr) for reconstructions using raysums corrupted by 
three different noise levels (a — 0.0, 0.5 and 1.0). The energy difference average is 
computed using the sum of the absolute values of the energy differences and all values 
in the last row (Avg) are calculated by averaging the ten individual values in the same 
column. (Negative values of DEnr indicate that the energy of the reconstruction is 
higher than that of the phantom) 



Phantom 


O 

d 

II 

b 


(7=0.5 


(7=1.0 


Num 


Energy 


DEnr 


DPi 


DPr 


DEnr 


DPi 


DPr 


DEnr 


DPi 


DPr 


1 


36480.53 


-45.43 


35 


23 


-11.40 


54 


42 


596.05 


98 


82 


2 


36087.34 


-50.89 


60 


24 


-54.48 


79 


60 


442.01 


157 


157 


3 


36986.62 


-220.74 


51 


27 


-27.03 


60 


61 


662.92 


105 


106 


4 


37427.52 


-77.94 


72 


23 


-34.01 


96 


46 


421.38 


94 


88 


5 


36344.12 


-32.58 


39 


24 


149.69 


68 


49 


556.40 


128 


95 


6 


36171.59 


16.10 


44 


26 


148.82 


80 


55 


682.61 


127 


84 


7 


36414.56 


-30.55 


41 


31 


32.81 


61 


40 


751.89 


141 


97 


8 


36905.06 


-97.57 


49 


24 


-62.91 


64 


46 


705.68 


104 


100 


9 


36273.46 


-56.87 


59 


20 


155.52 


109 


51 


532.50 


165 


91 


10 


36064.98 


-40.43 


60 


26 


84.34 


77 


46 


507.49 


118 


89 


Avg 


36515.58 


-63.18 


51 


25 


38.13 


75 


50 


585.97 


124 


99 



difference (DEnr) was computed as DEnr = Eniphantom - ^^^reconstrucUon, the 
pixel difference (DPi) reports the number of pixels that were different between 
the phantom and the reconstruction, and the projection difference (DPr) reports 
the sum of the absolute differences between the data (phantom raysums) and 
the projection sums for the reconstructed image. The total number of pixels and 
projections were equal to 3,969 and 251, respectively. The last row of Table [D 
reports the average (Avg) of the absolute energy differences, pixel differences 
and projection differences for the ten phantoms. The averages pixel differences 
(DPi) and projection differences (DPr) were rounded to the nearest integer. The 
average percentages of misclassified pixels are 1.3% (if cr=0.0), 1.9% (if cr=0.5) 
and 3.1% (if cr=1.0). 

The numbers reported for the phantoms in Table Elrefer to the total number 
of pixels in the three objects, while the numbers under LA, LV and RV report 
the absolute difference of the areas measured for the three objects in the phan- 
tom and reconstructions. The area difference averages (DA) reported in Table 0 
contains information about all ten phantoms (rounded to the nearest integer). 
The average percentage errors in the areas are 1.0% (if (T=0.0), 1.3% (if cr=0.5) 
and 2.6% (if cr=1.0). 

Since the choices of the a and /3 in o and the “measurement model” ex- 
pressed in 0-0 are somewhat arbitrary, we have looked into the possibility of 
improving on our results by a more careful choice of these. To date we have not 
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Table 2. Table reporting the areas for the three objects (left atrium, left ventricle and 
right ventricle) in the phantoms and the absolute difference between the the object 
areas in the phantom and in the reconstructions using raysums corrupted by three 
different noise levels {a — 0.0, 0.5 and 1.0). The last row reports the average of such 
differences (DA) for each object over all phantoms rounded to the nearest integer 



Phantom 


O 

O 

II 

b 


(7=0.5 


(7 = 1.0 


Num 


LA 


LV 


RV 


LA 


LV 


RV 


LA 


LV 


RV 


LA 


LV 


RV 


1 


293 


441 


170 


0 


1 


8 


0 


2 


4 


11 


8 


3 


2 


213 


539 


238 


7 


1 


1 


7 


3 


4 


8 


2 


16 


3 


197 


355 


119 


0 


1 


0 


2 


5 


8 


5 


9 


11 


4 


109 


439 


101 


2 


2 


1 


1 


2 


0 


2 


3 


11 


5 


97 


527 


358 


3 


4 


2 


3 


1 


4 


7 


3 


2 


6 


177 


469 


382 


1 


1 


2 


3 


10 


3 


10 


3 


4 


7 


177 


389 


367 


2 


7 


2 


5 


2 


0 


5 


1 


2 


8 


177 


349 


255 


10 


1 


1 


8 


2 


3 


18 


7 


3 


9 


185 


583 


218 


1 


2 


0 


4 


5 


12 


1 


5 


15 


10 


225 


515 


349 


3 


2 


3 


0 


1 


0 


2 


11 


7 


DA 


182 


452 


258 


3 


2 


2 


3 


3 


4 


7 


5 


7 



succeeded to do this; other models we tried did not improve upon the results 
resported in Tables ^ and 0 



6 Conclusions 

We have shown how Gibbs priors can be defined and used in binary reconstruc- 
tion problems. Experimental tests were done for the case when data are known 
for two or three projections. An algorithm based on the Ryser graph and the 
Metropolis algorithm was tested and it was found that two views were not suffi- 
cient to determine the object even if the data are noiseless and the Gibbs prior 
is based on the very pictures to be reconstructed. On the other hand, in the case 
of three views, our results indicate that a similar approach could be useful in 
triplane cardiac angiography even in the presence of noise in the data. 

A modified Metropolis algorithm based on the known Gibbs prior proved 
to provide a good tool to move the reconstruction process towards the correct 
solution when the projection data by themselves are not sufficient to find such a 
solution. Our experiments suggest that if an algorithm is able to maximize the 
Gibbs probability subject to consistency with the data, then it is likely to be able 
to (nearly) recover a random image from the Gibbs distribution. This supports 
our hypothesis posed in the introduction, namely that if an image is a typical 
member of a class of images having a certain Gibbs distribution, then by using 
this information we can usually limit the class of possible solutions to only those 
which are close to the (unknown) image which gave rise to the measurement 
data. 
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Abstract. Surgical navigation systems are used intraoperatively to pro- 
vide the surgeon with a display of preoperative and intraoperative data 
in the same coordinate system and help her or him guide the surgery. 
However, these systems are subject to inaccuracy caused by intraopera- 
tive brain movement (brain shift) since commercial systems in use today 
typically assume that the intracranial structures are rigid. Experiments 
show brain shifts up to several millimeters, making it the cause of the 
dominant error in the system. We propose an image-based brain shift 
compensation system based on an intraoperatively guided deformable 
model. We have recorded a set of brain surface points during the surgery 
and used them to guide and/or validate the model predictions. Initial 
results show that this system limits the error between its brain surface 
prediction and real brain surfaces to within 0.5 mm, which is a signif- 
icant improvement over the systems that are based on the rigid brain 
assumption, that in this case would have an error of 3 mm or greater. 



1 Introduction 

The use of surgical navigation systems has become a standard way to assist 
the neurosurgeon in navigating within the intraoperative environment, planning 
and guiding the surgery. One of the most important features of these systems is 
the ability to relate the position of the surgical instruments to the features in 
the preoperative images. Ideally, they should provide a 3D display of the neu- 
roanatomical structures of interest and include visualization of surgical instru- 
ments within the same frame. In order to be reliably used, the surgical navigation 
system should be as precise as possible, preferably to within the voxel size of the 
used dataset (see 0). Most of the current systems use preoperatively-acquired 
3D data and register it to the patient coordinate system (see 03S|). However, 
they assume that the brain and other intracranial structures are rigid and fixed 
relative to the skull. The preoperative data is registered to the patient coordinate 
system at the beginning of the surgery. While this can be done with a precision 
to within 1 mm at the initial moment (see 0), since the brain deforms over time, 
the accuracy of the system deteriorates. The median brain shift of points on the 
brain surface was estimated to range from 0.3 mm to 7.4 mm (see [Tj). It is clear 
that the system based on the rigid brain assumption cannot achieve a precision 
better than a few millimeters at the outer structures. Since the deeper brain 
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structures deform less than the outer ones, the error is the largest at the cortical 
surface. The brain deforms even more after interventions, e.g. post-resections. 
Furthermore, the average brain shift for cases in which hematoma or tumors 
were removed was reported to be 9.5 mm and 7.9 mm, respectively (see 0). In 
such cases the error is even larger. 

In our research we are mainly concerned with (but not limited to) issues 
surrounding epilepsy surgery. To quantitatively investigate such a case we have 
recorded six points on the exposed brain surface approximately every ten min- 
utes during the surgery starting when the dura was opened. The mean shift in 
the direction perpendicular to the brain surface was about 3 mm. The initial 
and final set of points displayed over the rigid (initial) brain surface are shown 
in Fig. d This result clearly shows the need for a high quality intraoperative 
3D acquisition system and/or a method for estimating brain shift. The tradeoffs 
among different approaches to these problems are discussed later in the paper. 
The approach we have taken is to use a biomechanically-based deformable model 




Fig. 1. Intraoperatively recorded points on the brain surface at the beginning of the 
surgery are shown at left, while their positions one hour later relative to the non- 
deformed (initial) brain surface are shown at right. Gravity is perpendicular to the 
sagittal plane. The points moved in the direction of gravity and they are hidden under 
the brain surface (only one of the points is still visible in the figure at right). Since 
the brain deformed (in the direction of the gravity vector) the surface points moved 
relative to the original (initial) brain surface 



that incorporates the effects of gravity and can be driven by intraoperative mea- 
surements. Currently, we have performed only partial validation of the deforma- 
tion results, since a full human in-vivo 3D validation is practically difficult with 
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current technology. With this system, 3D estimation of the brain shift can be 
performed in real-time, i.e. faster or equal to the real brain deformation, and 
for this reason can be used in an actual operating room (OR) application. This 
project is a continuation of our endeavors to overcome the brain shift problem 
in surgical navigation, initialized by our modeling efforts reported in 0. This 
work extends the model, puts the model in touch with real data and discusses 
our plans for a complete brain shift compensated surgical navigation system. We 
also note relevant work in soft tissue deformation (see I Dfl I) ). 

2 System Overview 

Our approach to brain shift compensation is to deform an intraoperatively- 
guided model and use the model data during the surgery to display (deformed 
according to the current model state) preoperative data. Therefore we propose 
an image based brain shift compensation system made up of several compo- 
nents: segmentation, mesh generation, a model, registration of the model to the 
intraoperative environment, driving and guiding the model, and displaying the 
deformed data. 

The first step is segmentation of the brain tissue and the skull since they 
are the two most important parts of the model. For brain tissue extraction we 
have adopted the automatic segmentation algorithm suggested in m, enhanced 
with a few pre- and post-processing steps. Eventually, the skull segmentation 
will be done from CT scans, and then it will be registered with the initial MRI 
data. However, for this preliminary effort, we approximate the inner skull sur- 
face segmentation using dilation and erosion operators applied to the previously 
segmented brain tissue. An output is shown in Fig. 0 It is important to have 
the inner skull surface available for the model since it defines the boundary con- 
ditions. Clearly, the brain is bounded by the skull and it cannot go outside it. 
When a gravitational force is applied to the brain, slightly globally, but non- 
rigidly, it shifts downward, and from the bottom and sides it is resisted by the 
inner skull surface. Therefore, the largest deformation is on the top of the brain. 

For object surface rendering we have used an improved version of the algo- 
rithm suggested in H3|. Some of the surfaces produced by this algorithm can be 
seen in Figs. [H 0 and El 

In order to display and use brain surface points a correspondence between 
the patient and MRI dataset coordinate systems has to be established. We used 
a set of markers placed on the patient’s skin. In the OR, the marker coordinates 
were recorded using a mechanical localizer m- In addition the markers were 
manually localized in the MRI dataset. Next, a robust point matching algorithm 
for resolving the correspondence and finding the optimal transformation between 
the two sets was applied. It is important to notice that for various reasons, the 
surgeon is not always able to touch all of the markers. Therefore one of the two 
sets could contain outliers. Our point matching algorithm covers such cases in an 
automatic fashion. The result of the matching between the two sets of markers 
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(a) (b) (c) 



Fig. 2. An output of the inner skull surface segmentation algorithm in three orthogonal 
slices: (a) axial, (b) coronal and (c) sagittal. The brain tissue is colored white, inner 
skull surface gray and CSF black 



is shown in Fig.El The same mechanical localizer is used to record the points on 
the brain surface during the surgery. Once the correspondence was established, 
the brain surface points were transformed to the MRI coordinate system. 

The next step is to generate the model mesh from the segmented brain tissue. 
Here we use prism (“brick”) elements, having 8 nodes at the vertex positions. 
The output of our mesh generator is shown in Fig. 01 The mesh does not cap- 
ture all of the fine details of the segmentation output, since this mesh density 
allows for reasonable performance (in terms of errors). A much finer mesh that 
would capture all brain geometric details (i.e. all sulcal structures) would have 
too many nodes and would slow down computation, not achieving a significant 
improvement in performance. Once the current node positions are known, any 
information obtained prior to surgery can be deformed according to the model 
interpolation functions (the trilinear back interpolation used for this purpose is 
explained in the next section). Currently, we use the model to deform the MRI 
gray scale image slices (three orthogonal slices) using texture maps and the outer 
brain surface, but it can as easily be used to deform additional CT, functional 
MRI, MRA or any other volumetric preoperatively-acquired data with update 
speed limited only by the graphics capabilities of the display engine. 

3 Model 

3.1 Brain Tissue Modeling 

According to our findings and findings of other groups (see |1I2] 1 brain shift 
is a relatively small deformation and a slow process. This fact facilitates our 
approach to brain tissue modeling. As we move in these directions we also note 
relevant work in soft tissue deformation (see j^lYl^lbllDll Ij b Here, we employ 
a linear stress-strain relation, which is a good approximation for small tissue 
displacements. The model consists of a set of discrete interconnected nodes each 
representing a small part of the brain tissue. Nodes have masses depending 
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Fig. 3. The lighter points are the marker positions obtained from the MRI dataset, 
while the darker ones are the markers touched by the surgeon in the OR. These two 
figures show the marker sets after the correspondence has been established. There are 
12 markers, but in this case, the surgeon managed to touch only 10 of them. Our 
correspondence algorithm handles the outlier problem as well 



on the size of the volume they represent and on the local tissue density. Each 
connection is modeled as a parallel connection of a linear spring and dashpot, 
known as the Kelvin solid model (see M)- As for the nodes, the connection 
parameters can depend on their position in the brain. The Kelvin solid model 
is a model for a visco-elastic material subject to slow and small deformations, 
which is exactly the case with brain shift. It is also a rather simple approach, 
which is a desirable property since the model deformation should be computed 
in real time, i.e. faster or at least at the speed of the brain deformation, since 
it must be utilized (e.g. displayed) during the surgery. The constitutive relation 
for the Kelvin solid model is 



a^qo€ + qi€, ( 1 ) 

where cr is stress and e strain, while qo and qi are local parameters. The dotted 
variables represent the time derivatives, e.g. e = ^e. 

The equation m can me rewritten in the following way. If two nodes are at 
positions ri and r2, have velocities Vi and V2, and are connected in the above 
fashion, then the force acting on the first node is 

finnBr{ri,r2,v^,V2) = [ks{\\r2 ~ rilj - ria) - kd{v2 - Vi)ri2i] ri2i, (2) 

where ks is the stiffness coefficient, kd is the damping coefficient and ri2 is the 
rest length of the spring connecting the two nodes. In a general case they can 
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Fig. 4. The mesh generator output. The left figure shows the mesh, while the right one 
shows the mesh and the outer brain surface 



vary from connection to connection depending on the local material properties. 
ri 2 i is the unit vector from ri to r 2 - Note that the same force acts on the other 
node but in the opposite direction. 



3.2 Modeling the Brain — Skull Interaction 

The brain-skull interaction as modeled in our initial efforts in |0|, is a highly 
nonlinear function, and significantly slows down the adaptive step-size numerical 
integration. The consequence was that the steady-state for this previous 3D 
model was reached in approximately four hours, which is much slower than 
the real brain deformation, and therefore the model cannot be used for display 
updating during the surgery. A coarse approximation could be to make the outer 
brain surface nodes rigid in the bottom part of the brain (bottom with respect to 
gravity) as used in [II Ij . However, we think that having the brain-skull interaction 
contributes to the total precision of the system. 

For this reason we now use an alternate approach. Prior to the simulation, 
the skull and brain tissue have to be segmented. Ideally, the MRI scan would be 
used for brain tissue segmentation, and CT scan for skull segmentation, but for 
the aforementioned reason we have used the procedure explained in the previous 
section to extract the inner skull surface. The brain-skull interaction is not 
directly incorporated in the model equations, but rather incorporated via the 
numerical integration, through a contact algorithm. As the model evolves over 
time, when a node enters the skull area, it is returned to its previous position (to 
its position from the previous step in the numerical integration). This prevents 
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nodes from entering the skull, but permits them to come arbitrarily close to it 
(more precisely, close up to the precision set in the numerical integration) and 
can move along the skull surface if pulled by forces that are not perpendicular to 
the skull surface. Effectively, nodes can move freely unless they reach the skull, 
in which case they can move only in the direction tangential to the skull surface. 
This behavior is identical to the one achieved by the brain-skull interaction 
suggested in |^, but it is much faster to simulate. As a result, the 3D model now 
needs about 10 minutes to reach the steady state, which is faster than the actual 
brain deformation (which is approximately half an hour). Thus, this model can 
potentially be used during the surgery, which is our eventual goal. 

3.3 The Model Equations 

Newton’s Second Law for each node j in the model gives 

= m^g + '^ finnerlj , (3) 

where is the node’s mass, is its acceleration, finner^j is the interaction 

between nodes j and si defined by m and g is the gravity acceleration, while 
{sj, S 21 ■ ■ • ! } is the set of the neighboring nodes of the node j. Equation 

represents a system of second order nonlinear ordinary differential equations. 

One can define the state variables to be X 2 j-i = and X 2 j = for j = 
1, . . . ,N, where N is the number of the brain model nodes. Obviously, X 2 j-i = 
X 2 j. The expression for X 2 j follows directly from @, since X 2 j = ^X 2 j — aK It 
depends only on state variables but not on their time derivatives. Now it is clear 
that (3) can be rewritten in the compact state-space form X = f{X), where X 
is the vector of the state variables and X = It is assumed that the brain 
starts deforming from a rest position, i.e. v^{t = 0) = 0 for all j. The initial node 
positions (t = 0) were obtained from the preoperative images, as discussed in 
the previous section. 

The system in state-space form is suitable for numerical integration (see [ 1 ) . 
In this case the fourth order Runge-Kutta method with adaptive stepsize was 
employed. The brain-skull interaction is implicitly included in the numerical 
integration as explained in the previous section. 

3.4 Interpolation 

The output of the numerical integration is the set of model nodes over time. One 
usually wants to display deformed gray scale data (e.g. from preoperative MRI) 
using texture maps, brain structure surfaces or any other preoperative data. For 
this purpose we have employed trilinear interpolation. 

The texture map (we use texture maps to display three orthogonal slices in 
the MRI datasets) deformation and the brain surface deformation are principally 
different procedures. In the case of the texture map deformation for a given 
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voxel position in the current frame, one should find the element (“brick”) it 
belongs to, find the voxel local coordinates in the element, and then find the voxel 
position in the original (initial) model state, using the same local coordinates 
in the corresponding initial element. Since the corresponding initial position is 
generally not exactly at a voxel center we perform an interpolation among the 
neighboring voxels. This procedure is referred to as back interpolation. The brain 
surface deformation is a reverse process. For a given point in the initial state, one 
should find out the element it falls in and the corresponding local coordinates 
and then, using the same local coordinates find the new (deformed) coordinates 
in the same element in the current frame. 

The trilinear interpolation, i.e. the dependence between the global x^y and 
z coordinates and local element a, /3 and 7 coordinates is given by the following 
equation: 



x={cf + c^a){cf + c|/3)(c^ + cgy), 

j/ = (c 5 ' + c|a)(c| + c|/3)(c| + c^ 7 ), (4) 

z = {cf + c|a)(ci + c|/3)(c| + c| 7 ). 

The equation can be expressed using matrix notation in the following 
way, 

X 

y 

z 

It is obvious that the function is nonlinear, but it is linear with respect to any 
single local coordinate (e.g. it is linear with respect to 7 ). Therefore it is called 
a trilinear function. To be strict, this function should be called a “tri-affine 
function” but it is commonly referred to as being trilinear. The 24 elements of 
the matrix A are uniquely determined from the fact that the local coordinates 
take either 0 or 1 at the eight element vertex positions. It is easy to show that 
this interpolation provides Co continuity. 

One can directly compute the global coordinates for given local coordinates 
using However, it is not simple to solve o for a,/3 and 7 (the solution 
expressions are huge) . For this reason we have used an iterative search method to 
determine local coordinates for given global coordinates. The method converges 
very fast, i.e achieves the given precision ( 0.1 mm) in several iterations. 

An example of texture map deformation is given in Fig. El The increase in 
the gap between the skull and brain at the top is small (approximately 3 mm - 
check Table n in the next section for the maximal surface movement value). The 
MRI dataset is a 256 by 256 by 124 T1 weighted sequence (1 mm by 1 mm by 
1.5 mm). 



= A: 



3.8 
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Fig. 5. The texture map deformation. The left figure shows the initial state while the 
right shows the final state. Note the increase in the gap between the skull and brain 
at the top 



4 Parameter Estimation 

The global problem in modeling, especially in modeling heterogeneous materi- 
als, is reliable model parameter setup and estimation. The approach we have 
employed here is to use intraoperative measurements to estimate model param- 
eters. 

Although our model allows for local parameter control, we still assume a 
homogeneous model for two reasons. First, it is very difficult to estimate the 
brain tissue parameters locally and second, there are contradictory reports in 
the literature regarding white and gray stiffness properties. Even in the case 
of a homogeneous model there are two parameters to be estimated: stiffness 
coefficient kg and damping coefficient kd in 0. 

For parameter estimation we have used so called off-line parameter estima- 
tion, where the whole sequence of the recorded (and registered) brain surface 
“time” points was used. Practice shows that the steady state does not depend 
on the choice of the damping coefficient, but only on the stiffness coefficient. The 
damping coefficient determines how fast the steady state will be reached, while 
the stiffness coefficient determines the final shape of the brain. 

For this reason we use the steady state to estimate the stiffness coefficient. 
An approximate value for this coefficient is initially assumed, the model is driven 
to the steady state and signed average distance over all six of the recorded points 
to the model surface is computed. Based on the signed average distance a new 
stiffness coefficient is chosen, and the procedure is repeated until the final average 
signed error was small enough (we required it to be smaller than 0.5 mm). 

Once the stiffness coefficient is determined, the damping coefficient is deter- 
mined in a similar fashion, but this time reducing the average signed distance in 
the transient period. 



Real Time 3D Brain Shift Compensation 



51 



The Table ^shows the average distance between the rigid (initial) gray/CSF 
brain surface and recorded brain surface points over time (i.e. during the op- 
eration) in row “surface movement”. In addition, row “model error” contains 
the average error between the model prediction of the gray/CSF brain surface 
and the recorded brain surface points over time. This table contains data for a 
single patient undergoing epileptic (implant) surgery. The surgeon touched six 
points (measured their positions with the mechanical arm) every 7 minutes (on 
average). 



Table 1. Average brain surface movement and model error 



time[min:sec] 


0;00 


7;40 


14:40 


19:40 


24:40 


34:52 


49:00 


max 


surface movement [mm] 


0.34 


1.38 


2.21 


2.30 


2.74 


3.24 


3.29 


3.29 


model error [mm] 


0.34 


0.45 


0.30 


0.13 


0.20 


0.32 


0.04 


0.45 



One can see that the distance between the initial gray/CSF brain surface and 
the recorded brain surface points increases over time reaching 3.29 mm. The 
model with optimal parameters (determined in the off-line way) has maximal 
error of 0.45 mm over time. Clearly, the use of the model has done a reasonable 
job of estimating the brain shift near the brain surface (where the error is the 
greatest). 

However, the off-line parameter estimation cannot be used in OR applica- 
tions, since at each moment only the current and previous measurements are 
available, not all the measurements over time. The parameters would need to be 
estimated using the available intraoperative data. An idea for on-line parame- 
ter estimation is to start with reasonable initial parameters, based on previous 
experiments (say, on other patients), and then to adjust the parameters accord- 
ing to the error between the model prediction and the measurements. At the 
moment, the intraoperative measurements we have are too sparse and noisy to 
allow for on-line parameter estimation. Refer to the Discussion section for our 
future work plans including on-line parameter estimation. 

5 Intraoperative Model Guidance 

In addition to designing a reasonable model, and estimating model parameters 
in an optimal sense, one can guide the model by intraoperative data. The idea 
is to readjust the model at the time points when the intraoperative measure- 
ments are available, and in between to let the model deform on its own. The 
model tries to predict the node positions at the moment of new measurements, 
new measurements are used to readjust the model, and so on. The denser the 
intraoperative data are both in space and time the smaller the error between the 
model and real brain. 

We have employed such an approach using the model parameters estimated 
as described above. The results are given in the Table El (this table also contains 
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data for a single patient). It is clear that performance has improved over the 
case of non-data-guided model. The maximal error in the case of guided model 
is 0.20 mm (not taking into the account the initial state), but over time it is 
even smaller because of the feedback- like guidance. The error does not approach 
zero over time due to noise in data. 



Table 2. Average brain surface movement and model errors 



time[min:sec] 


0:00 


7:40 


14:40 


19:40 


24:40 


34:52 


49:00 


max 


surface movement [mm] 


0.34 


1.38 


2.21 


2.30 


2.74 


3.24 


3.29 


3.29 


non-guided model error [mm] 


0.34 


0.45 


0.30 


0.13 


0.20 


0.32 


0.04 


0.45 


guided model error [mm] 


0.34 


0.20 


0.05 


0.01 


0.03 


0.08 


0.02 


0.20 



The last two rows represent the same type of error as the model error in Table 
n but for non-guided and guided models. One should be aware that the errors 
reported in Table El for guided model are very small since the six recorded brain 
surface points are used to guide the model, and then the error is computed with 
respect to them. A more realistic error estimation is given in the next section. 
However, in an OR application one should use all available intraoperative data 
to guide the model. 

The results of the brain deformation modeling are shown in the Fig.|Sl The 
brain model is deformed by taking into account the measurements (points) , and 
the surface of the brain is computed according to the current model state. One 
can see that the surface (prediction) matches the points (measurements). 



5.1 Validation 

To completely validate the model reliability one would need to obtain a dense 
time sequence of 3D brain datasets using intra-operative sensing, and then com- 
pare the model predictions to the actual deformations. This can be done by using 
intraoperative MRI, or maybe an intraoperative CT or ultrasound scanner. Since 
at this point we have brain surface points recorded over time as the only intra- 
operative data, we used two of the points to guide the model and compared the 
model predictions to the rest four points. The results are given in the Table 0 



Table 3. Average brain surface movement and model error 



time[min:sec] 


0:00 


7:40 


14:40 


19:40 


24:40 


34:52 


49:00 


max 


surface movement [mm] 


0.34 


1.38 


2.21 


2.30 


2.74 


3.24 


3.29 


3.29 


validated model error [mm] 


0.34 


0.12 


0.38 


0.44 


0.35 


0.49 


0.14 


0.49 
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(a) (b) (c) 



Fig. 6. The results of the model deformation: (a) represents the recorded points at 
initial time with the initial brain surface. The points are on the brain surface, (b) 
represents the final brain surface points with initial brain surface (darker) and final 
brain surface (lighter). One can see that the brain surface points moved inside the 
original brain. This is due to the effect of gravity that pulled the brain downwards, (c) 
represents the final brain surface points and final brain surface. The points are again 
on the brain surface. The final brain surface is obtained from the final model state, 
i.e. it is a prediction of the surface, while the final points are the measurements on the 
brain surface when the brain settled down (after approximately one hour from the mo- 
ment when the dura was opened) . The prediction (surface) matches the measurements 
(points) 



As before the last row represents the error defined in the same way as the 
model error in Table ^ but in this case for the validated model (validated by the 
four left points - first two were used for model guidance). This partial validation 
suggests that the use of our model reduces the error caused by the brain shift, i.e. 
the difference between the current brain and the current model state is smaller 
than the difference between the current brain and the initial brain. 

6 Discussion 

The main contribution of the model we have suggested is the ability to signif- 
icantly reduce the error in real timeQ. The model has 2088 nodes, 11733 con- 
nections and 1521 elements (bricks) and it takes typically less than 10 minutes 
on an Octane SGI workstation (RIOOOO 250 MHz processor) to reach the steady 

^ By real time we mean the following. The brain deforms with certain speed (it takes 
about 30 minutes to assume a steady state). At the other hand it takes certain 
time to simulate the brain deformation on a computer, i.e. to deform the model. 
However, at say 8 minutes after opening the dura (8 minutes of the actual, surgical 
time) the corresponding model state (the state that corresponds to the 8th minute of 
the actual time) has already been computed and stored in the memory, and can be 
used for displaying (deformed) surfaces, texture maps or whatever is needed. Thus, 
the simulation of the brain deformation is computed faster than the actual brain 
deformation, i.e. in real time. If this is not the case, i.e, if the simulation of brain 
takes more time than the actual brain deformation, then the model cannot be used 
during the surgery (in real time) for displaying the deformed surfaces and datasets. 
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state. This is significant improvement compared to our previous model (see 0). 
For these reasons it can be used in a real time application as a part of surgical 
navigation system. Once the model deformation is computed, any preoperative 
data, including but not limited to brain structure surfaces, slice texture maps 
of MRI, fMRI, CT or MRA data can be deformed accordingly. In other words, 
knowing the model state (the deformed mesh) at certain time (to), and know- 
ing the initial model state, and preoperative say fMRI dataset, one can directly 
calculate the corresponding deformed fMRI data (at time to)- 

As noted above, an alternative is to use an intraoperative MRI system (see 
m)- They certainly have their own advantages, but they are expensive, restrict 
surgical access to the patient, prevent usage of metal surgical tools and their 
spatial resolution is typically not as high as that of preoperative MRI. However, 
a combination of a deformable model and intraoperative MRI and/or CT would 
probably provide a surgical navigation system with a means to handle a variety of 
deformations potentially including those due to tissue removal with an acceptable 
precision. 

To successfully model the brain deformation one needs to take into account 
not only the soft tissue mechanics, but also neuro-anatomical knowledge. For 
instance, our neurosurgical colleagues observe that it appears that the cerebellum 
does not deform due to toughness of tentorium. If this assumption is valid, this 
part of brain does not need to be modeled, and the deformable part of the 
volume is reduced, causing the performance (both in precision and speed) to be 
enhanced. 

Our future work is eventually aimed at on-line parameter estimation, richer 
intraoperative data acquisition, including intraoperative (portable) CT imaging, 
CSF modeling and non-homogeneous brain tissue modeling. 
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Abstract. Knowledge about the status of the female reproductive sys- 
tem is important for fertility problems and age-related family planning. 
The volnme of these fertility requests in our emancipated society is 
steadily increasing. Intravaginal 3D ultrasound imaging of the follicles 
in the ovary gives important information about the ovarian aging, i.e. 
number of follicles, size, position and response to hormonal stimulation. 
Manual analysis of the many follicles is laborious and error-prone. We 
present a multiscale analysis to automatically detect and quantify the 
number and shape of the patient’s follicles. Robust estimation of the 
centres of the follicles in the speckled echographic images is done by cal- 
culating so-called winding number of the intensity singularity, i.e. the 
path integral of the angular increment of the direction of the gradient 
vector over a closed neighbourhood around the point. The principal edges 
on 200-500 intensity traces radiating from the detected singularity points 
are calculated by a multiscale edge focussing technique on ID winding 
numbers. They are fitted with 3D spherical harmonic functions, from 
which the volume and shape parameters are derived. 



1 Introduction 

Changes in societal behaviour have led to postponement of childbearing in most 
developed countries. In the Netherlands the mean age at which a woman gives 
birth to her first child has now risen to 30 years. As female fecundity decreases 
with advancing age an increasing number of couples is faced with unexpected 
difficulties in conceiving. It is estimated that approximately 15,000 couples visit 
infertility clinics in the Netherlands annually. For some 70% of these couples, 
age-related fecundity decline may play a role and a further increase is to be 
expected. 

The decline in the number of follicles containing oocytes from the ovary and 
a decrease in the quality of these oocytes are the main causes of the decline in 
female fecundity. This loss in functional capacity of the ovary is more rapid in 
some women. Identification of women with advanced loss in ovarian function has 
been quite difficult so far. Recent research has shown that the number of visible 
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follicles, assessed by ultrasound in a group of women, is correlated with proven 
fertility mm- Based on this observation, ultrasound-based follicle counts are 
being developed as a ‘test’ for reproductive age. This enables us to recognise 
infertile women with exhausted reproductive capacity and advise them to refrain 
from further diagnosis and treatment. Likewise, recognition of especially younger 
infertile women with advanced loss of follicles will lead to prompt referral for 
the application of assisted reproduction techniques. In our modern emancipating 
society, questions are being raised related to the planning tension between career 
and family: when I am young, what is the status of my reproductive system; can 
I safely postpone childbearing and first pursue a career? When I get older, until 
what age am I still likely to be able to conceive spontaneously? It is known that 
the decrease in number of follicles is bi-exponential, and accelerates after the 
age of 37 f4l I iSj (Fig. [IJ. Furthermore, the risk of damage to the ovary during 
chemo- or radiotherapy is another reason to predict the age when the follicular 
storage is depleted. 




Fig. 1. Two 2D slices from a 3D ultrasound image of a normal 23 year old volunteer. 
The follicles are clearly visible as dark hypo-echogenic circular areas in the ovary, which 
is visible as a slightly darker background in the central part of the images. Typical 
diameters of follicles are 2-8 mm, in a 2-4 cm diameter slightly ellipsoidal ovarian 
capsule 



For a large-scale evaluation of these application areas, high quality and au- 
tomated information about the ovarian anatomy, especially of the follicles, is 
needed. 3D ultrasound turns out to be an practical and cost effective acquisi- 
tion mode. Manual counting and measuring all the follicles by inspecting the 
2D slices from a 3D dataset is tedious and time consuming, and often inaccu- 
rate. Automated analysis reports have been few and only in 2D jl5lltil21<22j . 
Ultrasound data are characterised by interference noise, a wide range of often- 
occurring artefacts, and low contrasts. So robust and noise resistant methods 
must be developed to find the follicle centres and contours. Often follicles are 
not spherical, particularly when they are touching each other, making Hough 
transform methods less suitable m- This paper describes a new scale-space 
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based method to detect and delineate the follicles automatically and accurately 
in 3D ultrasound. The paper focuses on the multiscale detection methods. A 
clinical paper describing the patient studies in detail is in preparation. 

2 Ovarian Anatomy 

The decline of human quota of oocytes begins before birth when ovarian aging 
begins. At birth some million follicle are present, the number falling continuously 
during childhood and adulthood until a few hundreds to thousands remain at the 
age of about 50 m During the total reproductive lifespan only a few hundred 
will reach full maturation and ovulation. The left and right ovaries are thumb 
size structures, containing the collection of follicles. Follicles are round or oval 
structures embedded in the tissue of the ovarian stroma. The wall of the follicles 
comprises hormone-producing cells that are responsible for the production of 
fluid that is contained by the follicle wall and are filled with liquid. In transvagi- 
nal ultrasound they appear as clear low echoic spheres (Fig. [Q. A prime indicator 
for ovarian aging is the number of antral follicles exceeding a certain size, their 
relative position in the ovary, and their responsiveness (expressed in growth rate) 
to hormonal stimulation This last measurement typically requires multiple 
periodic measurements, i.e. daily measurements over 3-5 consecutive days. 

3 3D Ultrasound 

2D vaginosonography can only yield sagittal and frontal sections of the lesser 
pelvis; 3D volume scanning, however, visualises all three perpendicular planes 
simultaneously on a monitor screen. The 3D ultrasound system (Combison 5600, 
Kretz Technik AG, Medicor, Austria / Korea) can be equipped with a 12 MHz 
transvaginal 3D probe of 2.2 cm diameter, focal distance of 2-10 cm. The system 
is capable of a full 3D image acquisition in about 2 seconds. From the pyramidal 
volumetric dataset a Cartesian dataset is extracted with equidistant voxels by 
interpolation. Sonographically, follicles with diameters of 3 mm and above can 
be detected reliably. 3D ultrasound has some important advantages over 2D 
imaging. Volume measurements using 2D ultrasound methods have been found 
to be much less accurate than 3D ultrasound methods for irregularly shaped 
objects H2|. It is a step towards interactive follicle puncturing [Sj . To prevent as 
much as possible the appearance of vessels just outside the ovary and to restrain 
the held of view to the ovary proper, the operator, guided by 3 simultaneous 
orthogonal multiplanar reformatted views, performs a 3D cut-off of the total 
volume (maximum size 256^ voxels) . Typical resulting datasets are 180x180x150 
voxels (1 byte/ voxel intensity range). 

4 Detection of Follicle Centres by 3D Topological 
Winding Numbers 

The key clinical question is the automatic counting of the number of antral folli- 
cles and their size distribution as the indicator for fertility from noisy datasets. 
From Gaussian scale-space theory we know that the extraction of (invariant) 
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differential structural information like edges and curvature needs to be done 
with regularised differential operators, i.e. Gaussian derivative kernels |H|. The 
centre of a hypo-echoic follicle is characterised as a singularity of the luminance 
field; here the intensity (of the observed, thus multiscale, blurred image) reaches 
a minimum. A singularity is defined as a point where the intensity gradient 
vanishes. The local isophote curvature reaches infinity, due to the vanishing of 
the gradient. Singular points are important topological structural descriptors of 
images, especially when their behaviour is studied as a function of scale. They 
are used in the next step after local features detection, as important nodes in a 
perceptual grouping process, when the multiscale context of pixels needs to be 
taken into account. 

The detection of singular points can conveniently be done by studying the 
so-called winding numbers. From the theory of vector fields m important theo- 
rems exist (Stoke’s and Gauss’) giving the relation between something happening 
in a volume and just on its surface, i.e. we can detect the singularities by mea- 
surements around the singularity. To explain the notion, we start in 2D. Image 
intensity is denoted by the gradient is denoted in index notation by ^i, where 
indices always run over the dimensions: = {d^/dx, d^/dy}. The winding num- 

ber V is defined as the number of times the image gradient vector rotates over 27 t 
when we walk around the point: i.e. we integrate over a closed path, indicated 
by dW, the increments of angle the vector is making: 



We expand the left- and right-hand side of the last equation in a Taylor series 
up to first order in a and ^i, respectively. For the left-hand side we obtain 





Fig. 2. Small rotation a vector along the contour 
of the closed path around the point of study 



!, M, (i 



From Fig. 0 we see: 



tan a = 




1 



tan(o! -I- Aa) = tan a + 



cos^ a 



Aa + 0{Aa'^) 
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and for the right hand side we obtain 



^2 + ^^2 _ ^2 + ^^2 _ ^2 + ^^2 
Cl + z\Ci “ Cl 




_ 6 , Cl - 6^6 

“Cl c? 



1 




Taking the limit Aa —>■ 0 and using the expression for tan a we get 



CidC2 — C2^Cl 



In our case we consider a unit gradient vector, so Ci + Cl = using sub- 
script notation we obtain da = , where is the antisymmetric tensor 



The rotation is always an integer number times 2tt (in 2D), which gives 
interesting robustness through rounding. In 3 dimensions we calculate the space 
angle of the gradient ^id^j/\d^k, where we recognise the gradient Ci and a directed 
infinitesimal surface element d^j A dC/c- This is a so-called wedge (A) product 
(see e.g. H3I). We integrate these surface elements now over the closed surface 
around our point of study, and see how often a full space angle of 47 t is reached. 
This is then the SD-winding number. 

In practice, in 3D it is calculated as follows: we investigate the 26 voxels 
around a specific voxel. The form is defined in 3D as 



Indices in pairs are summed over the dimensions, which process is called 
contraction of indices (the summing symbol in front of the equation is routinely 
left out: the so-called Einstein convention). Performing the contraction of indices 
on I and m gives 



This expression has to be evaluated for all voxels of our closed surface. We can 
do this e.g. for the 6 planes of the surrounding cube. On the surface z =constant 
the previous equation reduces to 



{{ 0 ,- 1 }, { 1 , 0 }}. 






^ = ( {dj;^jdy^k - dy^jdx^k)dx A dy 



+ {dy^jdz^k - dzijdy^k)dy A dz 

+ {dz^jdx^k - dxijdz^k)dz A dx ) 



$ = e^^'^^i{{dz;^jdy^k - dy^jdx^k)dx A dy. 



Performing the contraction on the indices i, j and k gives 



^ = ^Udxiydyiz - d^^zdy^y) 
+ 2^y{dyi,dy^z ~ d,^,3y^z) 
+ ^Udxixdy^y - d^^ydyiz)- 
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The gradient vector elements = d^/dx,d^/dy,d^/dz can be 

calculated e.g. by neighbour subtraction, as can be done in a similar way for the 
derivatives of the gradient field, e.g. dx^y = d^y/dx. The single pixel steps dx 
and dy are unity. 

The general theory comes from homotopy theory, where so-called topological 
homotopy class numbers are defined 19111)1111 . In d dimensions we again see how 
these reflect the behaviour of the intensity gradient in a close neighbourhood 
dW around a given point. So, the d-dimensional homotopy class number v of an 
image pixel over the surface dW of the small environment W around the point 
is defined as follows: 



ow 



f'' 



There are no singularities on dW . For regular points, i.e. when no singularity 
is present in W, the winding number is zero, as we see from the Stokes’ theorem: 



= ^iid^i 2 A d^i^ . . . , Stokes: ® ^=® d<l> = 0, 

Jdw Jw 

where the fact that the {d— l)-form <P is a, closed form was used. So, as most of 
our datapoints are regular, we detect singularities very robustly as integer values 
embedded in a space of zeros. 




Fig. 3. The direction of the gradient as a vectorfield for a minimum (upper left), saddle- 
point (upper right) and monkeysaddle (lower left) in a 2D spatial intensity distribution. 
The number of full rotations of the gradient vector tracing a path around a point is the 
winding number v, here indicated as multiples of one full rotation 2-k of the gradient 
vector. All regular points give rise to v = Q. The centre of a follicle is a singular point 
in 3D, i.e. a minimum with v = 1 
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The winding number has nice properties: 

— Within the closed contour there is a conservation of winding number; when 
we enclose a saddlepoint and a minimum, we measure the sum of the winding 
numbers (they sum to zero in this case as we are close to their annihilation); 

— The winding number is independent of the shape of dW , it is a topological 
entity; 

— The winding number only takes integer values as multiples of the full rota- 
tion angle; even when the numerical addition of angles does not sum up to 
precisely an integer value, we may rightly round off to the nearest integer; 

— The winding number is a scaled notion, the neighbourhood defines the scale; 

— The behaviour over scale of winding numbers generates a treelike structure 
which shows typical annihilations, creations and collisions, from which much 
can be learned about the ‘deep structure’ of images; 

— The winding number is very easy to compute, in any dimension; 

— The winding number is a robust characterisation of the singular points in 
the image: small deformations have a small effect. 




Fig. 4. 2D echographic slices from the 3D dataset. Follicles appear as black circles 
(yellow (white) arrows). Detected follicle centres are marked yellow (white), oversized 
for clarity. Length arrow: 1 cm 

In ID the homotopy class number boils down to the difference of the sign 
of the signals second derivative taken from the left and from the right. We will 
use the ‘edge focusing’ multiscale behaviour of this ID number in the sequel for 
the characterisation of multiple points on the surface of the follicle. The the- 
ory of homotopy numbers can easily be extended to subdimensional manifolds 
(strings, surfaces) in higher dimensions and for other vectorfields, such as the 
frames spanned by the eigenvectors of the Hessian, or any other well defined 
vectorfield 0. 
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5 The Detection of Follicle Centres and Boundary Points 
by 3D and 2D Topological Numbers 

The winding number in 3D is computed by adding the increments in orientation 
angle of the gradient vector when moving the gradient over a closed surface 
around the point of study, e.g. along the 6, 8 or 26 neighbouring pixels. Because 
we need detection of the minima of extended structures, i.e. the follicles, which 
are much larger then the noise grains of the raw data, we need to move to 
a higher scale of analysis (i.e. blur). We perform the following steps for the 
automatic detection of the follicles: 

— isotropic blurring of the 3D ultrasound data, and the establishment of an 
optimal scale in the sense of minimising the number of false-positives; a too 
small scale gives too many minima, a too large scale too few; 

— calculation of 3D winding numbers as estimators for the follicle centres; this 
gives their number and spatial distribution; 

— generation of 200-500 radial rays in a homogeneous orientation distribution 
from these centres and determine the most pronounced ID intensity edge 
along the ray by its ‘lifetime’ over scale; 

— fit spherical harmonics functions to some order to the detected endpoints in 
order to get an analytical description of the shape of the follicle; from this 
the volume can easily be calculated, and statistics on shape. 

In Fig. 0we show a typical result for a patient dataset; the detected winding 
numbers are indicated as yellow (white) dots, indicated by arrows. The winding 
numbers do not show up in all follicles because only the slices through the 
follicle centres are shown. From the winding number locations, 200-500 rays 
(ID profiles) are drawn in all directions, and a maximum length of 32 pixels. 
The search for the most prominent contrast step along the rays is done by edge 
focusing of the ID winding number over scales 0 to 2 pixels in increments of 0.1. 




Fig. 5. Hierarchical multiscale edge detection. Left: noisy ID intensity profile Right: 
sign of the second derivative (subtraction of neighbours) as a function of scale. Scale 
(vertical axis) ranging from 1 to 5 pixel units. The sign of the second derivative (black 
— , white +) is plotted as a function of scale. Note the closure of the extrema (causality 
in a linear scale-space) 
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Fig. 6. Left: 3D scatterplot of the detected edgepoints of three bovine ovary follicles. 
Right: the corresponding fitted spherical harmonics. Note the irregular shape of the 
follicles 

Figure El shows an example where seemingly no edge is present but where 
at high scale one edge emerges despite the noise, which is traced down to the 
lowest scale. The longest lifetime is taken as a measure of importance of the 
edge, indicating a follicle boundary point. When more edges survive at cr = 2 
pixels we decided to take the edge at cr = 2 pixels which is closest to the centre of 
the follicle. The detected edgepoints are then fitted with 3D spherical harmonic 
functions using Mathematica Spherical harmonics are orthogonal 

functions with respect to integration over the surface of the unit sphere, and 
form a natural basis for the description of (in our case convex) 3D shape. The 
advantages of spherical harmonics are the wide range of shapes that can be 
modelled, giving explicit knowledge about the deviation from a pure spherical 
shape, and that the volume of the follicle can easily be calculated by analytical 
integration. 

Figure 0 shows a set of detected points, and the corresponding 3D fit by 
spherical harmonics. The method was tested on artificial data: 3 spherical test 
follicles with diameters of 3, 6 and 12 pixels (intensity 0) in a 643 pixel cube 
(background intensity 1) with additive and uncorrelated Gaussian noise of cr = 
0.1,0.25,0.5 and 1.0 intensity unit. Figure d shows a plane with some of the 
225 edgepoints detected by edge focusing for the largest follicle in the noisy test 
dataset (tr = 0.25) and a blurring scale of 6 pixels. The detection works well, and 
the average radii were correct within a half a pixel for all tests. The detection of 
follicle minima from the 3D US data by 3D winding numbers is scale-dependent 
and we need multiple scales. Some follicles are only detected at small scales, 
other at large scales. 



^ Mathematica commands to generate the spherical harmonics to 4**' order and do the 
fitting: 

fitset = Table [SphericalHarmonicY [1 ,m, 0 , 0] , {m, -1,1, 1} , {1 ,0 ,4}] ; 
fitted[0,<^] = Fit [data, fitset , {0,0}] ; 

ParametricPlotSD [fitted [0,0] {Cos [0] ,Sin[0] Cos [0] ,Sin[0] Sin[0]}, 
{0,O,7 t} ,{0,O,7t}] ; 
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Fig. 7. Edgepoints are successfully detected on the 
surface of the follicle in the noisy testset. Back- 
ground intensity 1, follicle intensity 0. Additive un- 
correlated Gaussian noise: cr = 0.25. Blurring scale 
6 pixels. Image resolution 64^ pixels. The detected 
edgepoints are indicated as dots on the ultrasound 
image 



Figure [Dshows the detected number for blurring scales 3, 4, 6 and 10 pixels. 
At cr = 3 pixels, many minima are detected, but also many false positives. At 
scale CT = 4 pixels (without scale a — 3 pixels) only one minimum is missed. At 
a — 6 pixels we have little errors, but also few new detections. At cr = 10 pixels 
we have no errors, we only detect the large follicle (s). These cannot be seen at 
smaller scales, due to the impossibility to detect minima at a small scale in a 
homogeneous region of a follicle (i.e. the follicle is large relative to the size of 
the operator). This leads to the conclusion that two scales suffice: ct = 3 and 
cr = 10 cover the detection range well for the 3D US datasets. Processing times 
for typical datasets (150 x 150 x 150 pixels) take about 1 minute per scale on a 
300 MHz Pentium II PC. 
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Fig. 8. Upper left: number of detected 
points as a function of scale. Upper 
right: cumulated number of detected 
points. Lower left: Diagram of detected 
follicles in bovine ovary II as a function 
of the blurring scale (vertical scale, in 
pixels) 



False positive winding numbers also generated 225 edgepoints which should 
be discarded. If such a winding number emerged as a noise minimum, the set of 
edgepoints can be tested for roundness i.e. discriminated by the large variance 
of the detected radii length, or a test can be done on the (low) intensity of the 
internal pixels of the pseudo-follicle. This turned out to be very difficult due 
to the great variability in echo amplitude output. If the false winding number 
is due to another structure (vessel, another follicle) the shape derived from the 
edgepoint fit may discriminate. We have not performed this test yet. However, 
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if the data is cut off in such a way as only to include the ovary, such detections 
are unlikely to occur. We employed this strategy as much as possible, because it 
is a fairly easy task to restrict the Cartesian volume to the ovarian space after 
scanning. 

The algorithm was implemented in a universal image analysis program writ- 
ten in Borland C-I--I- (Image Explorer by van Ginneken and Staal (ISI)). 

6 Validation of the Detection Procedures with Bovine 
Ovaries 

To calibrate the full range of steps of the automatic detection procedure, two 
fresh bovine ovaries with multiple follicles of widely different diameters were 
used as test objects. The thumbsize ovaries were subsequently: 

— scanned with 3D-US, immersed in physiological salt solution and with the 
3D ultrasound probe at the same typical object-probe distance (2-4 cm) 
as applied on female patients; three subsequent acquisitions with different 
probe positions; 

— scanned with high resolution FSE (fast spin-echo pulse sequence) MRI: slice 
thickness 0.5 mm, non-overlapping, in-plane pixelsize 100 x 100 p,, TR = 20 
ms, TE = 75 ms; (An example 2D slice from the 3D MRI is given in Fig. 0J) 

— sliced with a microtome, embedded in CMC (carboxymethylcellulose) and 
after freezing (—30 to —35 '^C), into anatomical coupes of 25 micron thick- 
ness. Image resolution 1528 x 1146 at 100-micron intervals with a digital 
camera (300 images, no histological staining). 




Fig. 9. Calibration of the automated method with two bovine ovaries. Left: Anatomical 
coupe. Middle: Coronal MRI, FSE. Right: 3D surface rendering of the follicles from 
the MRI acquisition shows their spatial relationship. Segmentation by thresholding 

We analysed the 3D-US data with scales a = 3, 4, 6 and 10 pixels. The 
cumulated number of pixels detected as a function of scale for the first bovine 
ovary is indicated in Fig. El (left). For cr = 3 we find 11 follicles detected, when 
also scale cr = 4 is included, we find 12. Adding cr = 6 we find 13, and adding 
(T = 10 no extra follicle is added. If we start from a slightly higher scale, i.e. 
cr = 4, we see in Fig. El (right) that only 5 follicles are detected, then adding 
cr = 3 gives 12 detected. Only scales ct = 4 and a — 6 gives 7 detected follicles, 
and only scales cr = 3 and 6 gives 13 detected. The conclusion here is that two 
scales suffice: a — 3 and cr = 10. If computing time is no penalty, a — 6 could 
be added. 
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The images from MRI and anatomical slices were analysed with standard im- 
age visualisation and measurement tools. The 3D-ultrasound data were acquired 
three times individually. The method of winding numbers introduces negligible 
dislocation of the minima despite the wide range of blurring scales, as can be seen 
in Table^where the x, y, z co-ordinates, the distances between the minima and 
the volumes of the three largest follicles of each acquisition is given. The average 
diameters (over 3 perpendicular directions) of the test follicles were measured 
after identification of the follicles in the corresponding 3D ultrasound datasets. 
The volume measured from the MRI and anatomical data are estimated from 
the average diameters and assuming a spherical shape. 



Table 1. x, y, z co-ordinates in pixels of the centres of the three largest follicles, from 
three individual 3D ultrasound acquisitions (vOO, vOl and v44) of bovine ovary I. Inde- 
pendent measurements. Note the accurate correspondence in the calculated distances 
between the winding number points, indicating independence of scale-dependent dislo- 
cation. The difference in volume from the spherical harmonics fit was about 4% for the 
two larger, and 10% for the smallest follicle. The three methods of volume measuring 
compare very favourably 



Follicle 

# 


X 

centre 


y 

centre 


z 

centre 


distance to 
neighbour 
(pixels) 


volume from 
spherical harmonics 
(mm^) 


volume 
from MRI 


volume 

from 

anatomy 


vOO 


99 


51 


35 


25.4 


259.7 


250.0 


262.1 




93 


28 


44 


45.0 


28.4 


27.0 


29.8 




113 


55 


74 


41.6 


56.2 


54.9 


59.3 


vOl 


33 


41 


66 


25.3 


242.3 








47 


22 


75 


44.5 


34.0 
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49 


44 


38.9 


54.7 






v44 


72 


49 


84 


25.4 


239.7 
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28 


70 


45.2 


28.4 








32 


51 


82 


40.1 


59.3 







7 Patient Data 

This study focuses on the methodology to automatically count and analyse the 
follicles from the 3D-ultrasound data, and only limited patient studies have been 
carried out so far. The follicle count results on 10 patients are shown in Table El 
Each dataset was cut off to include only the ovary immediately after scanning 
by an experienced echographer, and automatic and human expert counts were 
compared. We are currently finalising a clinical PC-based system with a user- 
friendly user-interface. In a next phase of the study the accuracy and efficacy of 
the method will be evaluated on a large patient group. 

8 Conclusion and Discussion 

The automatic detection of follicles from 3D ultrasound data is not an easy 
task, given the strong noise characteristics of the ultrasound signal, the size 
and contrast of the follicles and the follicle looking like structures in and at the 
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Table 2. Performance of the algorithm compared with a human expert. Number of 
follicles found. Data for 6 patients. The datasets are cut off to contain only the ovary. 
Scales used: a = 3.6, 4.8, 7.2 and 12 pixels 



Patient # 


manual 


Computer 


Patient ^ 


Manual 


computer 


1 


17 


15 


4 


14 


9 


2 


10 


8 


5 


9 


7 


3 


7 


5 


6 


9 


7 



immediate neighbourhood of the ovary. Clinically, the most important parameter 
is the number of follicles. Still, with the introduction of multiscale topological 
and edge focusing methods we are able to extract the follicles automatically with 
95% accuracy. Good cut-off to leave only the ovary region in the resulting 3D 
datacube by the operator immediately after scanning is an important step to 
higher score. 

We exploit knowledge about the convex shape of the follicles by means of 
spherical harmonics function fitting. This can be exploited in other ways, such 
as by a Hough transform, parametrically deformable contours m or scale-space 
primal sketch blobs, each over a range of scales. Our approach however enables a 
much better accuracy of shape description than e.g. a 5 parameter Hough trans- 
form for ellipsoidal shapes The approach is computationally efficient, 

with just sign differences over ID signals. The method can now be applied in a 
larger scale clinical setting, which is scheduled as a next phase of the project. 
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Abstract. It has previously been demonstrated that using 3-D rather 
than 2-D ultrasound can increase the accuracy of volume measurements. 
Unfortunately, the time required to produce them is also increased. While 
freehand 3-D ultrasound allows complete freedom of movement during 
scanning, the resulting B-scans are generally resampled onto a low reso- 
lution, regular voxel array before subsequent processing — increasing the 
time even further. In contrast, sequential freehand 3-D ultrasound does 
not require a voxel array, and hence both the data resolution and the 
processing times are improved. Such a system is presented here, incorpo- 
rating three novel algorithms, each operating directly on non-parallel 
B-scans. Volume is measured using Cubic planimetry, which requires 
fewer planes than step-section planimetry for a given accuracy. Maximal 
disc guided interpolation can be used to interpolate non-parallel cross- 
sections. Regularised marching tetrahedra can then be used to provide a 
regular triangulation of the zero iso-surface of the interpolated data. The 
hrst of these algorithms is presented in detail in this paper. 



1 Introduction 

There has been much research in the last two decades on systems which al- 
low the construction and visualisation of three dimensional (3-D) images from 
medical ultrasound data. One of the more compelling applications where 3-D 
ultrasound can provide a real benefit is in the accurate measurement of volume. 
This is important in several anatomical areas, for instance the heart [Z], foe- 
tus 0, placenta 0, kidney j^, prostate P, bladder and eye mi- Measurements 
have traditionally been made with 2-D ultrasound, but it is generally accepted 
that 3-D ultrasound can provide much greater accuracy. 

Freehand 3-D ultrasound allows the clinician unrestricted movement of the 
ultrasound probe. The ultrasound images (B-scans) are digitised and stored in 
a computer. In addition, the position and orientation of the probe is measured 
and recorded with each B-scan. The various 3-D ultrasound systems are reviewed 
in P . One of the disadvantages of freehand scanning is that the recorded B-scans 
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are not parallel — this makes processing of the data more complex, hence most 
systems interpolate this data to a regular 3-D voxel array, or cuherille. However, 
this can take considerable time and generate potentially misleading artifacts. 

By contrast, in sequential freehand 3-D ultrasound, the original B-scan data, 
and the order of acquisition of the B-scans, are maintained throughout the sub- 
sequent processing. This reduces the time from scanning to display, at a cost 
of a slight increase in processing time for each display. Moreover, any sequen- 
tial method which does not require human interactior0 has the potential to 
be performed during scanning, greatly decreasing the residual (post scanning) 
processing time. 

It has already been demonstrated that re-slice displays (i.e. 2-D displays in 
new orientations) and panoramic displays (i.e. 2-D displays with extended cover- 
age) can be performed efficiently by sequential methods |l 4) . Resampling is only 
performed once, rather than once to the cuberille and once again to the viewing 
plane, which leads to increased quality displays. This paper demonstrates that 
volume measurements and organ surfaces can also be efficiently estimated in a 
sequential manner. Segmentation remains the most complex and time consum- 
ing step in this process. In view of this, the proposed algorithms are designed for 
sparse cross-sections, to limit the time spent segmenting, in non-parallel planes, 
so the segmentation can be performed in the original B-scans (which do not 
suffer from interpolation artifacts). Reducing total organ volume measurement 
time is particularly important in a clinical setting. 



2 Volume Measurement Using Ultrasound 

2.1 Sequential Volume Measurement from Scan Plane Data 

Volume measurement using conventional 2-D ultrasound is achieved by approx- 
imating the organ of interest as an ellipsoid, or some other simple shape, and 
estimating the main dimensions from appropriate B-scans. A correction is then 
made to the result, dependent on the organ, the age and sex of the patient and 
other factors. There are many formulations for the resulting equations piTTT| . 

Ellipsoid formulae are easy to use, but they make geometrical assumptions 
about the shape of a given organ, leading to errors in the volume measurement 
which can be greater than 20%. Planimetry is an alternative approach, made 
possible with 3-D ultrasound, in which object cross-sections are outlined on each 
scan plane, and the volume is calculated from the cross-sectional areas and plane 
positions. The most common implementation of this is step-section planimetry, 
which assumes that the cross-sections are parallel. 

There are numerous reports which indicate that step-section planimetry is 
much more accurate than ellipsoid or other geometrical formulae [1 1 1 1 31 1 Oj . In one 
exception, planimetry was compared with 16 equations for measuring prostatic 
volume and "^{transverse diameter)^ {anteroposterior diameter) was found to be 

^ All the algorithms presented here are fully automatic, save segmentation. 
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marginally more accurate m- However, planimetry has recently been shown to 
have much better intra- and inter-observer variability ini- 

Freehand 3-D ultrasound does not generate data on parallel planes. Volume 
measurement is most often achieved by interpolating to form a cuberille, then 
segmenting the entire cuberille, resulting in a set of parallel cross-sections whose 
volume can be measured using step-section planimetry. This is equivalent to 
counting the voxels inside the object. 

There are two alternatives to this approach, which do not require the creation 
of a cuberille, and hence can in general be performed sequentially. The first is an 
extension of planimetry developed by Watanabe m to non-parallel, and even 
overlapping, cross-sections, which has been used to determine the volume of the 
prostate 0. The second, and more common, is to estimate the surface directly 
from the cross-sections, then calculate the volume of this surface. 



2.2 Volume Measurement from Surface Reconstructions 

Surface estimation is generally achieved by triangulation between neighbouring 
cross-sections. Such techniques have been developed for parallel cross-sections, 
but can usually be adapted for slightly non-parallel cases. Even for parallel cases, 
estimating the surface in a robust manner is difficult, which is evident from the 
wealth of related literature. 

Once the surface of an object has been estimated, the volume can be calcu- 
lated by several different methods, dependent on surface representation. 

Tetrahedral Volume. If the surface has been estimated by forming tetrahedra, 
the volume can be calculated from the sum of the volumes of these tetra- 
hedra. Alternatively, the polyhedral approximation formula developed in ^ 
can be used. This is based on tetrahedral volume, but formulated in terms of 
the points making up the object cross-section on each plane. Although this 
appears to allow volume calculation from cross-sections without triangula- 
tion, in fact a simple triangulation is assumed in the algorithm which will 
only be correct for simple shapes. This technique is used, for instance, in Pj. 
Cylindrical/Pyramidal Volume. If the scanning pattern is rotational, parts 
of cross-sections can be connected with the mid-point of the rotation to 
form pyramidal or cylindrical part sections, from which the volume can be 
calculated. This technique has been used for the eye El . Moritz also applied 
it to freehand scans by re-sampling these scans in a rotational pattern and 
then calculating the volume from the new cross-sections m 
Volume from Triangulation. Hughes has suggested two ways of measuring 
the volume directly from a triangulated surface. ‘Ray Tracing’ involves pro- 
jecting rays from a 2-D grid through the object, and calculating the volume 
from the length of the part of each ray contained by the object ^j. Alterna- 
tively, a discrete version of Gauss’ theorem can be adapted to calculate the 
volume component for each individual triangle such that the sum of these 
components is equivalent to the object volume [lOj . 
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Comparisons of some of the various freehand volume measurement techniques 
have been performed 0 — it is clear that any of the non-geometrical methods are 
to be preferred over the ellipsoid equations 0 . There are two areas that warrant 
further research, in the context of sequential volume measurement. Firstly, it is 
suggested in the literature EEI71 that using a cubic interpolant would increase 
the accuracy of the, already flexible, non-parallel planimetry technique. However, 
this has never previously been reported. Secondly, a sequential surface estimation 
algorithm is required which can handle cross-sections with the same complex 
topology and arbitrary orientation as this planimetry technique. 

3 A Sequential Volume Measurement System 

A fast and accurate volume measurement system has been incorporated into 
StradjiH |I4^ . Stradx is a flexible sequential freehand 3-D ultrasound tool which 
can be used to grab ultrasound video images and orientation information and 
display these in various ways, including re-slicing and panoramic displays, with- 
out creating a cuberille. The volume measurement system includes three novel 
algorithms for estimating volume, interpolating segmented data and triangulat- 
ing the iso-surface of this data, all from the original freehand B-scans. The first 
of these algorithms, for estimating volume, is presented in detail in Sect. 0 




Fig. 1. Stradx v5.2 interface 



http : //svr-www . eng. cam. ac .uk/~rwp/ stradx/ . 
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Figure n is an example of the interface provided for this purpose. The steps 

in the volume estimation are as follows. 

Manual segmentation. This is performed by the clinician on a subset of the 
original B-scans, using a mouse (the review_window in Fig. ^1. Although 
many automatic segmentation algorithms have been investigated, none are 
flexible enough to be used for generic ultrasound data in a manner which 
is faster than manual segmentation. This is no surprise — often the clini- 
cian will draw a cross-section through apparently featureless data, guided 
only be their prior knowledge of the organ shape and a 3-D reconstruction 
they have in their head. Semi-automatic segmentation methods which do 
use prior models are generally difficult to use, because it is hard to make 
the model both flexible enough to be used in different circumstances and 
detailed enough to be helpful. 

Real-time display of cross-sections. The cross-sections are displayed in 3-D 
wire-frame format as they are drawn (the outline.window in Fig.^J). This 
provides feedback on both the shape and the spacing of the cross-sections, 
allowing the clinician to concentrate on the more complicated areas. 

Real-time volume estimate. As the cross-sections are completed, a real-time 
volume estimate is calculated using cubic planimetry (the volume in the 
outline_window of Fig. OJ, described in more detail in Sect. El 

Surface estimation and display. Estimation and display of the object sur- 
face is useful for giving the clinician confidence in the segmentation. In or- 
der to achieve this, a distance field is calculated from the cross-sections, and 
an iso-surface triangulation algorithm used to extract the zero surface of 
this field (the surface _window of Fig. This is a more robust method 
than triangulating the cross-sections directly. The distance field is calculated 
from the non-parallel cross-sections using maximal disc guided shape based 
interpolation, described in which is a fast, simple method that can han- 
dle complex cross-sections. The iso-surface is triangulated using regularised 
marching tetrahedra, described in PHI, another fast method which generates 
triangles with good aspect ratios. 

Secondary volume estimate from surface. The volume of the triangulated 
surface can be calculated by a variant of Gauss’ theorem [S| (the volume 
in the surface_window of Fig. 01). Although this is less accurate than the 
cubic planimetry volume m similarity between the volume estimates gives 
confidence in the cubic planimetry volume. 

4 Cubic Planimetry 

4.1 Volume from Arbitrarily Oriented Planes 

The equation for the volume v of any object defined from sequential cross- 

sections is given by Watanabe m as 

f s ■ du> 



V = 



( 1 ) 
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where w is the position vector of the centroid of the cross-sectional surface S 
whose vector area is given by s, and L is the path of w as the object is scanned. 
Equation m can be evaluated discretely by approximating the integral using 
the trapezoidal rule between each pair of slices, which gives 



where N cross-sections have vector areas Si, . . . , sjv and centroids tui, . . . , 

This approximation is equivalent to assuming that the surface area projected 
onto a plane normal to the path of the centroids, L, varies linearly from one 
slice to the next. This is clearly true for objects whose cross-sectional area does 
not vary, e.g. prisms, and in this case is the exact solution. Paraboloids also 
have this property. However, objects which are either more concave or more 
convex than a paraboloid will not be correctly approximated by this equation. 
For example, the volume of a cone will be overestimated, and that of a sphere or 
an ellipsoid will be underestimated. The error increases as the number of scan 
planes reduces. 

Equation (0 can easily be implemented on a computer once the cross-sections 
have been determined and the areas and centroids calculated. In practice, the 
first step is by far the most time consuming, typically taking half a minute or 
so for each cross-section for manual segmentation. Once this has been done, the 
calculation of the volume is trivially fast in comparison (a few milliseconds). 

Clearly, some form of cubic rather than trapezoidal interpolation would in- 
crease the accuracy of the volume estimate and eliminate the bias towards 
paraboloids or prisms. It has been argued | 2 ] that the small increase in accuracy 
this would represent does not justify the additional complexity that would be 
required. However, two points can be made in defence of this approach. Firstly, 
the additional complexity is completely transparent to the user — once the al- 
gorithm has been implemented, the user performs precisely the same operations 
(i.e. outlining of the cross-sections) in both cases. Secondly, the reduction in the 
number of cross-sections required for an accurate volume estimation with cubic 
planimetry is very welcome, since segmentation is the time consuming step in 
the process. We present results to demonstrate this advantage in Sect. 0 

4.2 2-D Representation of the Problem 

Interestingly, the whole problem can be reduced to finding the area of a care- 
fully constructed 2-D graph which represents a combination of the original 3-D 
object with the scanning pattern. The equivalence between the 3-D and 2-D 
representations is shown in Fig. |3 

The area enclosed within the dashed and solid lines in the 2-D representation 
is equivalent to the volume which would be calculated by Watanabe’s trapezoidal 
equation from the 3-D representation. This can be easily proved by considering 
the 2-D representation to have a nominal thickness of 1 unit, and then applying 
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Volume, V 



Area, A 



(a) 3-D representation 



(b) 2-D representation 



Fig. 2. 3-D and 2-D representation equivalence, i) The length of a 2-D line, a, is 
equivalent to the area of the cross-section, |s|. ii) The length of the line, c, is equal to 
the magnitude of the vector, hi) The angle, a, between the line c joining the 

centres of each line a and the normal to those lines is equal to the angle, 9, between 
the vector area s and the vector joining the centroids of each area, iv) Similarly, 
the angle, /3, is equal to the angle, 4> 

(0 to calculate the area 



If the variables in m and 0 are equated for all values of i, then A = v. There 
is, however, significant redundancy in this conversion. Firstly, only the multiple 
of the lengths of the lines a and c is used, and hence an arbitrary scale factor 
can be multiplied into one, so long as it is divided from the other. This has the 
effect of stretching or shrinking the 2-D graph, but has no bearing on the volume 
calculation. Secondly, only the cosine of the angles a and [3 are used, hence they 
can be arbitrarily positive or negative. The effect of this choice is demonstrated 
in Fig. El 

Although the choice of angles has no effect on the volume calculated by 
the trapezoidal method, it clearly does affect how well the 2-D representation 
matches the original 3-D representation. Cubic interpolation involves the use of 
information from several sequential slices and, therefore, an additional heuristic 
rule is required to ensure that the angles a and (3 are chosen correctly. 




( 3 ) 



Equation 0 can be similarly re-written as 
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Fig. 3. Choice of angles for 2-D representation 



For each choice of angle a between the line joining centroids Ci and the area 
representation Oi, the angle which Ci makes with Ci-\ is also calculated. The 
value of a is then chosen for which this calculated angle is closest to the 3-D 
version (i.e. the angle which AiOi makes with A similar rule is employed 

for the angle (3, using the area normals rather than the lines joining the centroids 
as the reference. 

The result of this entire process is shown for an ellipsoid in Fig. 0J The 
ellipsoid was sliced with a scanning pattern which varied in position, azimuth, 
elevation and roll. The resulting 2-D graph retains some of the shape of the 
ellipsoid but also reflects the way in which it was scanned. 





(b) 2-D graph representation 



Fig. 4. Freehand scanned ellipsoid in 3-D and 2-D representations 



4.3 Cubic Interpolation of 2-D Representation 

If instead of joining the end points of the lines a with straight lines, a smooth 
curve is fitted between them, then the area enclosed by these curves should 
represent a more accurate measure of the volume of the original object. The 
curves must at least be cubic, since we would like to have continuity in at least 
the first derivative (i.e. the curves are smooth at the joints). They must also be 
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defined parametrically, since we expect them to have multiple values in both x 
and y directions. 

The smoothest possible curve could be obtained by fitting an appropriate 
function through all the end-points simultaneously. However, this sort of global 
optimisation is in general costly to compute, which would violate one of the 
motivations for improving the volume calculation, namely that the increase in 
processing time is negligible. A less optimal but faster solution can be found by 
using parametric cubic splines. We require a spline which interpolates the control 
points, i.e. the resulting curve passes through the points which are used to define 
it. This can be achieved with a spline introduced by Catmull and Rom PI, which 
uses four sequential control points (in our case the end points of the lines a), 
fitting a curve between the middle two of these points. 

The first and last curve segments are necessarily a special case, since only 
three points can be used to fit the curve. There are a variety of ways of handling 
this, which can all be implemented by inventing an additional control point. If 
this is chosen to be the same as the second to last point, the effect is to place 
a null gradient constraint at the end point, which results in a rate of change of 
gradient of zero. Figure 0 shows the curve for the same situation as in Fig. 0 
together with the actual curve which results from scanning in smaller steps. 




(a) cubic and linear 




(b) more cross-sections 




(c) (a) and (b) superimposed 



Fig. 5. Cubic and linear interpolations compared with actual 2-D graph for a scanned 
ellipsoid. The actual volume is 2.009. (a) the linear planimetry volume is 1.857 (92.4%), 
and cubic volume is 1.989 (99.0%). (b) using linear planimetry with more cross-sections 
gives 2.008 (99.9%) 



Once the curves joining the end points of the lines a have been defined, the 
area enclosed by them can be calculated directly from the parametric coefficients 
of each curve. This calculation is based on the application of dU, and is given in 
Appendix A. 
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5 Results 




(a) Cross-sections (b) Volume against number of scans 



(c) Surface 



Fig. 6. Results: Sphere, scanned using a fanning action 
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(a) Cross-sections (b) Volume against number of scans 




Fig. 7. Results; Cone, scanned using a linear sweep 



In order to verify the accuracy of the volume measurement algorithm, a computer 
simulation was constructed in which mathematical objects could be ‘scanned’ 
with freehand sweep patterns, and pre-segmented cross-sections generated. The 
results of this process are shown for a sphere, a cone and a ‘baseball glove’ 
shape in Figs. 0, Cl and 0 (a more thorough investigation is reported in H2|). 
The number of scans was varied in each case from 4 to 20, keeping the first 
and last scans fixed. The graphs show the volume measurements, using linear 
and cubic planimetry, and from a surface estimated with maximal disc guided 
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(a) Cross-sections (b) Volume against number of scans 




(c) Surface 



Fig. 8. Results: Glove shape, scanned using a linear sweep 



interpolation. Solid horizontal lines indicate the actual volume and a margin 
of ±1%. Cross-sections and surfaces are displayed for the minimum number of 
scans for which the cubic planimetry volume was within this margin. 

It is clear from these graphs that the cubic planimetry volume converges much 
faster than the alternative volume measurements — very few cross-sections are 
required to give an accuracy better than ±1%. This is the case both for com- 
plex shapes and freehand scanning patterns. In addition, maximal disc guided 
interpolation creates surfaces from these cross-sections which are good approxi- 
mations of the actual shapes. 

In order to validate in vivo volume measurements, the actual volume must 
be known. This can be done for the human bladder, by measuring the amount of 
voiding. The input to the bladder is difficult to measure, but can be estimated 
from sequential volume measurements in periods with no voiding. The bladder 
wall is very well defined by ultrasound, and is therefore easier to segment than, 
for instance, the kidney. 

Ten scans were performed of an initially full bladder, in pairs, with partial 
voiding between each pair. The bladder was completely voided after the eighth 
scan. The scans were performed in fast sequence, the output being collected 
for later measurement, in order to limit the amount of bladder filling during 
the experiment. The output was then measured using a 20ml or 60ml graded 
syringe (dependent on the volume) to an accuracy of approximately 1ml. The 
stored ultrasound B-scans were then segmented, using 15 to 20 cross-sections 
per examination: Figure Efa) shows an example of this. Volumes calculated from 
these cross-sections using cubic planimetry are tabulated in Tabled 

The amount of bladder filling was estimated in three stages. Firstly, the 
linear rate of filling was calculated, for each pair of scans, from the volume 
measurements. Secondly, cubic splines were used to interpolate these values and 
give a continuous bladder filling rate. Thirdly, this function was integrated, to 
give the estimated amount by which the bladder had filled at any point during 
the experiment. This information was used to estimate the actual bladder volume 
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(a) organ surface 




Fig. 9. Results: Human bladder. The bladder was partially voided four times during 
the examination. Two sets of scans were recorded between each partial voiding. The 
data is tabulated in Table [D 



at any point in time. The resulting curve is shown in Fig. Mh), along with the 
estimated amount of voiding from this curve. 



Table 1. Results: Human bladder. Fill is the estimated rate at which the bladder 
was filling. Diff is the calculated difference in volumes, adjusted for bladder filling, 
and Void is the actual measured output. The interpolated volume and difference are 
shown graphically in Fig. 0b) 



Time, m:s 


00:00 00:40 02: 


54 03:54 05: 


:17 05:58 07: 


28 08:07 10:26 11:02 


Volume, ml 


342.7 363.4 360.9 369.2 306.0 319.8 194.8 206.2 14.7 20.3 


Fill, ml/min 


30.7 


8.3 


20.0 


17.5 9.3 


Diff, ml 


24.2 


84.1 


155.6 


219.0 


Void, ml 


25 


74 


156 


234 


Error, % 


1.6 


6.8 


0.1 


3.2 



The errors in Table E are calculated for the actual volume measurements, 
rather than the amount of voiding. The amount of voiding is a complicated 
function of the volume measurements, due to the adjustments for bladder fill 
rate. Since it is essentially a measure of difference, the actual volume errors are 
assumed to add to give the voiding error, hence these errors are approximated 
to be half the voiding error. 
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6 Conclusions 

Cubic planimetry allows accurate measurement of volume with sequential free- 
hand 3-D ultrasound. None of the algorithms presented place any restrictions 
on the scanning pattern, and they do not require the construction of a cuberille. 
Volumes can be measured to ±1% in simulation and ±7% in vivo, with typically 
only 10-20 cross-sections. The entire process, including scanning and manual 
segmentation of the organ of interest, can be completed in only 5-10 minutes. 
This makes the system both a practical and accurate method of measuring organ 
volume using ultrasound in a clinical setting. 
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A Area from Parametric Cubic Splines 



Given two curves, each defined parametrically: 



[xt{t) yi{t)] = [t^ tl] 
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^i2 Ui2 
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where 0 < t < 1. 



If each curve is connected to the other by two straight lines joining the points, 
t = 0 and t = 1, the enclosed area can be calculated from the application of dU: 



A = 



s • du) 






( 5 ) 



where s is a vector normal to the line joining the curves at the same value of t, 
and uj is the position of the centre of that line: 



= [yi{t) - y 2 {t) X 2 {t) - xi{t) ] 
du}{t) = ^ [dxi{t) + dx 2 {t) dyi{t) + dy 2 {t)] . 



( 6 ) 
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Abstract. We suggest that identification and measurement of objects 
in 3D images can be automatic, rapid and stable, based on the statistical 
properties of populations of medial primitives sought throughout the im- 
age space. These properties include scale, orientation, endness, and me- 
dial dimensionality. The property of medial dimensionality differentiates 
the sphere, the cylinder, and the slab, with intermediate dimensionality 
also possible. Endness results at the cap of a cylinder or the edge of a 
slab. The values of these medial properties at just a few locations provide 
an intuitive and robust model for complex shape. For example, the left 
ventricle during systole can be described as a large cylinder with an apical 
cap at one end, a slab-like mitral valve at the other (closed during sys- 
tole) , and appropriate interrelations among components in terms of their 
scale, orientation, and location. We demonstrate our method on simple 
geometric test objects, and show it capable of automatically identifying 
the left ventricle and measuring its volume in vivo using Real-Time 3D 
echocardiography. 



1 Introduction 

The lineage of the medial approach may be traced to the medial axis (otherwise 
known as symmetric axis or skeleton) introduced on binary images by Blum and 
developed by Nagel, Nackman, and others [1-3]. Pizer has extended the medial 
axis to gray-scale images producing a graded measure called medialness, which 
links the aperture of the boundary measurement to the radius of the medial axis 
to produce what has been labeled a core, a locus in a space of position, radius, 
and associated orientations [4, 5] Methods involving these continuous loci of 
medial primitives have proven particularly robust against noise and variation in 
target shape [6]. Determining locations with high medialness and relating them 
to a core has been accomplished by analyzing the geometry of loci resulting from 
ridge extraction [7]. Models including discrete loci of medial primitives have also 
provided the framework for a class of active shape models known as Deformable 
Shape Loci [8]. 
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The objective of the work reported here is to build on these ideas to produce 
a method for analyzing the shape of the heart in Real Time 3D ultrasound, 
a new imaging modality that uses a matrix array of transducer elements to 
scan the moving heart in 3D at more than 20 frames/second [9]. The approach 
to analyzing this data aims to extract the scale, orientation and dimensionality 
(shape type) of sections of cardiac anatomy by statistical analysis of populations 
of medial primitives. In particular, the primitives are identified by first searching 
for individual boundary points throughout the image in an initial sweep, and 
then by matching pairs of boundary points to form what are called core atoms. 
Core atoms tend to cluster along a medial ridge and allow for statistical analysis 
of the core and its underlying figure. Core atoms have already been developed 
for analysis of 2D shape [10] and are generalized here to 3D. The analysis is also 
extended to spatially sampled populations of core atoms. This research is part 
of a Ph.D. dissertation which covers many aspects in greater detail [11]. 



2 What is a Core Atom? 

A core atom is defined as two boundary points b\ and b 2 that satisfy particular 
requirements (described in detail below) guaranteeing that the boundaries face 
each other. A core atom can be represented by a single vector Ci 2 from the 
first boundary point to the second. The core atom is said to be “located” at a 
center point midway between the boundary points (see Fig. 1). The medialness 
at the center point is high because the houndariness at both boundary points 
is high and because the boundary normals face each other. Core atoms carry 
orientation, width and position, providing the ability for populations of core 
atoms to be analyzed in these terms. 




Fig. 1. A core atom consists of two boundary points that face each other across an 
acceptable distance, and a center point at which the core atom is said to be located. 
The search area (gray) for boundary point 62 determined by boundary normal ni 



Unlike medial models where object angle (half the angle between lines from 
the center point to each respective boundary point) is permitted to vary, the 
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object angle of a core atom is fixed at 90°. Core atoms thus follow in the tradi- 
tion of Brady [12] . As in Brady, the underlying figure is not required to have 
parallel boundaries. In the experiments presented below, boundariness is based 
on a Difference of Gaussian (DOG) measurement of intensity gradient, accom- 
plished by repeated application of a binomial kernel. The number of applications 
determines the aperture of the boundariness detector, and is generally propor- 
tional to the size of the core atom. Further constraints are placed on the levels 
of intensity along the gradient direction. Other forms of boundariness, such as 
those based on texture analysis, could also be used for core atoms, provided an 
orientation is established for each boundary point. 

Boundariness vectors are sampled on a rectilinear grid, and their magnitude 
compared to a threshold to select a population of boundary points 6 ^ at locations 
Xi with orientations fii (“fi” denotes normalization, v = u/||t>||). The strength 
inherent to the statistics of populations is meant to counteract the weakness of 
thresholding. Core atoms are created from this population by finding pairs of 
candidate boundary points that satisfy the following three criteria: 

1. The magnitude of the core atom vector Ci^ 2 > be., the distance from one 

boundary point to the other, must be between Cmm and Cmax- 

^1,2 — ^2 ^min ^ ||e:i, 2 || ^ ^max- ( 1 ) 

The core atom vector can be oriented either way since the order of the 

boundary points is arbitrary. 

2. The boundary points must have sufficient face-to-faceness defined as 

F{bi,b2) = /l • /2 fl= Cl, 2 -fli f2 = C2,l • rj.2. (2) 

Since fi and /2 are normalized to lie between -|-1 and -1, their product F 
must also lie between -|-1 and -1. Values for F near -|-1 occur when the 
boundaries face towards (or away from) each other across the distance be- 
tween them. A threshold for acceptable face-to-faceness is set within some 
error e/ such that F{bi, 62 ) > 1 - £/. 

3. Assuming ^( 61 , 62 ) > 0, h follows that /i and /2 are both positive, or 
both negative. The sign of fi (or / 2 ) is called the polarity. The appropriate 
polarity is either -|- or - depending on whether the expected target is lighter 
or darker than the background. 

Although at first glance the search for pairs of boundary points appears to 
be O(n^), hashing individual boundary points beforehand by location yields a 
large reduction in computation time. The search area for b 2 is limited to a solid 
sector surrounding the orientation hi of the first boundary point, and to a range 
between Cmin and Cmax- The width of the sector depends on ey (see Fig. 1). 
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3 Three Basic Configurations: Sphere, Cylinder, and Slab 

Observe that collections of core atoms can group in three basic ways correspond- 
ing to the fundamental geometric shapes shown Fig. 2 . The surfaces are shown 
in dark gray with the corresponding cores shown in light gray. Beneath each 
object is the population of core atoms that would be expected to form with such 
objects, the core atoms being depicted as simple line segments. 

The sphere generates a “Koosh ball” like cloud of core atoms with spherical 
symmetry, with the core atom centers clustered at the center of the sphere. The 
cylinder generates a “spokes-of-a-wheel” arrangement with radial symmetry 
along the axis of the cylinder, and the core atom centers clustered along the axis 
of the cylinder. The slab results in a “bed-of-nails” configuration across the slab, 
with core atom centers clustered in the mid-plane of the slab. It is reassuring to 
find that the cores of these basic objects are the point, the line, and the plane. 
As shown in Fig. 2 , a system of shape-specific coordinate axes, namely di, a2, 
and d.3, can be assigned in each case, although not all the axes are unique given 
the symmetries involved. For example, in the slab, di and 62 can rotate freely 
about 0,3 . Such a set of coordinate axes can be found for any population of core 
atoms using eigenanalysis, as will be shown below. Furthermore, the extent to 
which a core atom population resembles one of the three basic configurations 
depends on the corresponding eigenvalues. 



sphere 



eylinder 






slab 




Fig. 2. Fundamental shapes (dark gray), corresponding cores (light gray), core atom 
populations (line segments) and eigenvectors di, 0,2 and 0,3 



Given a population of m core atoms Ci, i = 1 , 2 , 3 , . . . m, the analysis of a core 
atom population begins by separating each core atom vector into its magnitude 
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Ci and its orientation c^. We ignore, for the moment, the location of the core atom. 
The analysis of magnitude Ci over a population of core atoms is straightforward, 
yielding a mean and standard deviation for the measurement of width in the 
underlying figure. The orientation q of core atoms in a population lends itself 
to eigenanalysis, yielding measures of dimensionality and overall orientation for 
the population. We develop the eigenanalysis here in n dimensions, although for 
the remainder of the paper n will be 3. 

Given the population of m vectors in n dimensions, we find an n-dimensional 
vector di that is most orthogonal to that population as a whole by minimizing 
the sum of squares of the dot product between a and each individual q. 



ai = arg min 



E 



^ m 

(d • CiY = argmin(d^C'd) where C = — 

„ m ^ 



C, C- . 



(3) 



i=l 



The C matrix is positive definite, symmetric, and has a unit trace. Therefore, 
its eigenvalues are positive and sum to 1 , and its eigenvectors are orthogonal. 
If the eigenvalues of C are sorted Ai < A 2 < . . . < A„, the corresponding 
eigenvectors di . . . d„ are the axes of a coordinate system in which di is the 
most orthogonal to the population q as a whole. For example, it would be the 
axis of the cylinder in Fig. 2. Furthermore, the eigenanalyis guarantees that 0,2 
is the most orthogonal to the population Ci among those directions that are 
already orthogonal to di. This process can be repeated until d„ remains the 
least orthogonal to the population c^, representing a form of average orientation 
for Ci. 



4 The Lambda Triangle 

Returning now specifically to 3D, the previous analysis yields three eigenvalues 
which describe the dimensionality of the core. 



Ai > 0 Ai + A 2 + A 3 — 1. (4) 

An eigenvalue of zero means that the corresponding eigenvector is perfectly 
orthogonal to every core atom c^. Such is the case for di in the cylinder, and for 
both di and 0,2 in the slab. In the sphere none of the eigenvectors is completely 
orthogonal to the core atom population. Given the symmetries of the three basic 
shapes, the eigenvalues shown in Fig. 3 result. 

Since A 3 is dependent on the other two, the system may be viewed as having 
only two independent variables, Ai and A 2 . Because of constraints already men- 
tioned, possible values for Ai and A 2 are limited by Ai < A 2 and A 2 < (1 — Ai)/2 
which define a triangular domain we call the lambda triangle (Fig. 3). 

The vertices of the lambda triangle correspond to the three basic shapes in 
Fig. 2, with all possible eigenvalues falling within the triangle. A rather crude 
simplification of dimensionality is possible by dividing the triangle into three 
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sphere 

^,= 1/3 
A.2 = 1/3 
^3=1/3 




Fig. 3. The lambda triangle defines the domain of possible eigenvalues 



compartments to provide an integer description of dimensionality. Arbitrary 
thresholds of Ai = 0.2 and A 2 = 1/3 will be used to divide the triangle into such 
areas of integer dimensionality to clarify our experimental results. However, it 
should be remembered that the underlying dimensionality is not an integer or 
even a single scalar, but rather two independent scalars, Ai and A 2 whose values 
are constrained by the lambda triangle. 



5 Spatial Sampling in the Corona 

We now return to the question of core atom location, which we have so far 
ignored. To incorporate location into the analysis of core atoms, we sort them 
into bins on a regular 3D lattice by the location of their center points. Thus each 
bin represents a spatial sampling of medialness. The number of core atoms in a 
sample volume can be thought of as the medial density at that location. 

How do we choose an appropriate size for the sample volume? As we shall 
see, the local distribution of core atoms can have a significant cross section, and 
the density within that distribution may not be uniform. To preserve resolution, 
the sample volume needs to be smaller than the typical cross section of a core 
atom cloud. When a core is sampled off-center, it will demonstrate a distortion 
in its dimensionality. For example, the zero-dimensional core at the center of a 
sphere will appear to be one-dimensional (cylindrical) when sampled off center, 
as shown in Fig. 4. The vector from the theoretical core (center of the sphere) 
to the center of the density in the sample volume is called the displacement 
vector p (See Fig. 4C). The core atom population within a sample volume may 
not contain the entire thickness of the core, but rather a sub-sampling of the 
core called a coronal density. We can generally expect, in fact, to be sampling 
coronal densities. It would be helpful to know where, in a given cloud around the 



90 



G.D. Stetten and S.M. Pizer 



core, a sample was collected, but that presupposes knowledge about the overall 
distribution of core atoms which we may not have. 

We can, at least, predict certain relationships to exist between the distribu- 
tion of core atoms over the entire core and that of a sample volume displaced 
from the center of the core. The displaced sample of core atoms will be flattened 
in a plane orthogonal to p, and thus develop orthogonality to that direction. This 
can be seen in Fig. 4, where the spherical distribution of core atoms in 4B has 
been flattened into a cylindrical distribution in 4C. The same effect can be seen 
in the case of the cylinder in Fig. 5, where, displaced off the central axis of the 
cylinder by p, the population of core atoms becomes slab-like and orthogonal 
to p. One expects the displacement vector to be one of the eigenvectors at the 
closest point on the theoretical core, because (1) the displacement vector will be 
orthogonal to the core at that point, and (2) the normal to the core is always one 
of its eigenvectors. In 3D, the medial manifold can have at most 2 dimensions 
and thus will always have such a normal. 






Fig. 4. A. sphere. B. all core atoms C. cylindrical coronal density displaced by p 




Fig. 5. A. cylinder. B. all core atoms C. slab-like coronal density displaced by p 



Figs. 4 and 5 suggest that the displacement vector p could somehow be used 
to compensate for the dimensional distortion in the corona. However, an isolated 
density that is, for example, cylindrical cannot know whether it represents the 
true center of a cylinder or simply the corona of a sphere. The results of the 
eigenanalysis for each density may be used in a Hough-like fashion simultane- 
ously to vote for its own dimensionality and center of mass, and for possible 
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densities whose corona it may inhabit. The voting takes place within ellipsoids 
around each density. The axes of each ellipsoid are long in directions orthogonal 
to the core atom population in its density. Thus the ellipsoid can be expected 
to extend in the p direction, orthogonal to the core atoms. 




core atom population 
corresponding ellipsoid 
object 



Fig. 6. Ellipsoids of three coronal core atom densities coalescing at the true center 



Fig. 6 demonstrates this concept. A circular cross-section through an object 
is shown with three coronal densities (each containing 3 core atoms) displaced 
from the center. An ellipsoid is associated with each density, with the major axis 
of each ellipsoid along the eigenvector most orthogonal to the corresponding 
core atoms. The three ellipsoids intersect at the center the circle. The figure 
can be interpreted as the cross-section of a sphere with the populations of core 
atoms being cylindrical (seen in cross-section) and the ellipsoids intersecting at 
the center of the sphere (as in Fig. 4). Alternatively it can be interpreted as the 
cross-section of a cylinder with the populations of core atoms being slab-like and 
the ellipsoids intersecting along the axis of the cylinder (as in Fig. 5). There are 
various ways to construct such ellipsoids. We have chosen the following heuristic 
for its simplicity. The axes of each ellipsoid are the eigenvectors of its density’s 
C matrix. The lengths ai of the axes are related to the eigenvalues as follows: 



oi = 7C, 02 = — oi, 03 = — oi where = 1 — Aj, 7 = -. (5) 

ai ai 2 

The scalar distance c is the mean diameter of the core atoms in the density, and 
the dimensionless number 7 relates c to the size of the ellipsoid, determining 
how many neighbors will be reached. The ellipsoids make it possible to cluster 
the core atoms for a given cloud, in effect to coalesce the corona. Each sample 
volume (the votee) receives votes from all the neighboring sample volumes whose 
ellipsoids overlap it. The votes from those ellipsoids are assigned a strength v, 
where v = m-exp{—de^), m being the number of core atoms in the voting density, 
and de the ellipsoidal distance 
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from the center of the voter ellipsoid to the votee, d being the vector from 
the voter to the votee. Votes are constructed to contain information about the 
voter, including its C matrix which may simply be summed (scaled by v) for an 
eigenanalysis of the entire constituent core atom population of a particular can- 
didate. Thus are formed what we call superdensities, clusters of core atoms that 
no longer suffer from coronal distortion. The center of mass for the constituent 
core atom population of a superdensity will tend to be at the true core, rather 
than in the corona. 



6 Tests with Parametric Objects 

To validate these methods, we applied them to three parametric test objects 
with simple geometries: a sphere, a torus, and a spherical shell. The torus is 
basically a cylinder of varied and known orientation, and the spherical shell is 
likewise a slab of varied and known orientation. (The sphere is simply itself.) 

Eigenanalysis of the coronal densities collected in a rectilinear lattice of sam- 
ple volumes yielded the following results. Fig. 7 shows all densities containing 
greater than 1% of the entire core atom population plotted on the lambda trian- 
gle. The sphere shows two groups of densities, one near the top (sphere) vertex 
of the triangle and another near the right (cylinder) vertex, consistent with the 
dimensional effects of the corona predicted in Fig. 4. The torus, which is locally a 
cylinder, shows clustering near the right (cylinder) vertex, with some spreading 
towards the left (slab) consistent with the dimensional effects of the corona pre- 
dicted in Fig. 5. The spherical shell, which is locally a slab, shows tight clustering 
at the left (slab) vertex consistent with the observation that core atoms in a slab 
are collinear with p and therefore will not develop significant orthogonality. 

Unfortunately, Fig. 7 does not contain spatial information about the sampled 
densities. The spatial distribution of densities for the test objects is shown Fig. 
8. Each sample volume whose density contains more than 1% of the total core 
atoms is shown as a thin-lined symbol. The simple partition of the lambda 
triangle in Fig. 3 is used to decide between three possible symbols: a slab is 
represented as a single line, a cylinder as a cross, and a sphere as 3 intersecting 
axes. The length of the thin lines is constant, chosen for clarity in each test 
object. The orientation of the thin lines indicates the predominant direction(s) 
of core atoms in each density, i.e. across the slab, or orthogonal to the axis of the 
cylinder, keeping in mind that perfect spheres have no predominant orientation 
and perfect cylinders allow arbitrary rotation around the axis. 

As expected the sphere shows cylindrical densities in its corona oriented to- 
wards the center. Further out from the center a few slabs-like densities reflect 
simply the paucity of core atoms in those sample volumes. Near the center one 
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Fig. 7. Distribution of densities on lambda triangles, for parametric test objects 




Fig. 8. Densities and superdensities for parametric objects 



true spherical density (a small 3-axis symbol) may be discerned. The thick-lined 
symbols show the results of ellipsoidal voting, i.e., they represent superdensities. 
To prevent a cluttered illustration, superdensities are limited to non-overlapping 
constituencies. They are represented by thick lines in a manner similar to the 
densities, except the length of the axes now corresponds to the actual mean scale 
of the constituent core atoms. Thus the thick-lined 3-axis cross indicates the 
actual diameter of the spherical object. For the sphere there is only one predom- 
inant winning superdensity, with virtually every core atom in its constituency. 
The torus shows cylindrical densities properly oriented but dispersed through- 
out the corona. At the outer regions of the corona a few slab-like densities are 
visible. The superdensities, by contrast, are centered on the circular mid-line 
of the torus. The spherical shell shows only slab-like densities, which coalesce 
with ellipsoidal voting into slab-like superdensities. The orientation of both are 
across the local slab. Ellipsoidal voting is seen to perform another function, that 
of connecting densities that share a core along the mid-plane of a slab or the 
axis of a cylinder. 
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7 Endness 

Some attention must be paid to cases where a cylinder ends at a hemispherical 
cap, or a slab ends at a hemicylindrical edge. The property of endness has been 
described by Clary, et al. [13] . Endness as viewed from the core atom perspective 
is illustrated in Fig. 9A and 9B. To detect endness, densities of core atoms are 
used as starting points. Once a local cylinder has been established, boundary 
points are sought along the axis of the cylinder in either direction as evidence 
of a cap. Similarly, once a local slab has been found, boundary points indicating 
an edge can be sought. Mathematics for this is derived elsewhere [11]. 




Fig. 9. Endness, manifested as a cap on a cylinder (A) and the edge of a slab (B) 



8 Identifying and Measuring the Cardiac Left Ventricle 

We now turn to a useful clinical application, the automated determination of 
left ventricular volume using Real Time 3D (RT3D) echocardiography. RT3D is 
a new imaging modality that electronically scans a volume in 3D using a matrix 
array instead of the conventional linear array. RT3D is described in detail else- 
where [9] , but its primary novelty is the ability to capture a single cardiac cycle 
at 22 frames/second, which no other available imaging modality can accomplish. 

RT3D images of an in vivo human heart present a significant challenge to 
image analysis techniques, including high noise, low resolution, path dependence, 
and a non-rectilinear data space. These problems are addressed elsewhere [11] , 
but the suggestion that the statistical nature of our method yields robustness is 
severely tested in its application to RT3D echocardiography. 

We now expand on the example from the abstract: The left ventricle during 
systole is basically a large cylinder with an apical cap at one end, and a slab-like 
mitral valve at the other (we limit ourselves here to apical scans, and to times 
when the mitral valve is closed). The model is shown in Fig. 9C. To identify 
the cylinder in the image data, core atoms of an appropriate range of diameters 
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A. Cylinder of ventricle 



B. Slab of mitral valve 





C. Automated axis and surface map D. manual tracings of ventricle 
Fig. 10. Real Time 3D ultrasound with automated and manual identification of LV 



were collected in sample volumes on a regular lattice, and ellipsoidal voting was 
applied. An example of the resulting superdensities is displayed in Fig. lOA. 
Crosses are shown in the cylindrical chamber of the ventricle. Due to the pre- 
selection of core atoms by scale, no other significant densities of core atoms were 
found. 

Next, the mitral valve was sought, by limiting the formation of core atoms to 
an appropriately smaller scale, and to orientations nearly perpendicular to the 
transducer. As shown in Fig. lOB, the strongest superdensities (short vertical 
line segments) were clustered around the center of the mitral valve, although 
weaker false targets were detected in the myocardium. To eliminate these false 
targets, a criterion was established for the formation of appropriate pairs of su- 
perdensities, in the spirit of core atoms. Only slab-like densities appropriately 
located and oriented with respect to cylindrical densities were accepted. These 
pairs were allowed to vote for their constituent superdensities, and the mean 
location of the winning superdensities used to establish a single mitral valve lo- 
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cation and a single LV cylinder location. The vector between these two locations 
was used to establish a cone for expected boundary points at the apex of the LV, 
and the mean distance to the resulting boundary points used to determine the 
location of the apical cap along that vector. Thus an axis between the apex and 
the mitral valve was established. Given this axis, LV volume was estimated by 
collecting boundary points around the axis. Only boundaries that faced the axis 
were accepted. The boundary points were organized into bins using cylindrical 
coordinates, in other words, disks along the axis and sectors within each disk. 
An average radius from the axis was established for the boundary points in each 
bin, creating a surface map of the endocardial surface. Fig. IOC shows such a 
surface map (dots) and the underlying axis. The problem of empty bins was 
avoided by convolving the surface map with a binomial kernel in 2D until each 
bin had some contribution to its average radius. Volumes were then calculated 
by summing over all sectors. The entire procedure including identification and 
volume measurement of the LV was automated, and required approximately 15 
seconds on a 200 MHz Silicon Graphics 02 computer. 

The automated volumes were compared to manual tracings performed on a 
stack of flat slices orthogonal to a manually-placed axis (see Fig. lOD). This axis 
employed the same anatomical end-points (the ventricular apex and the center 
of the mitral valve) as the axis determined automatically above. The volumes 
and locations of the end-points were compared to those determined automati- 
cally. Results are shown in Fig. 11. They are very encouraging, particularly for 
the automated placement of the axis end points, which had an RMS error of 
approximately 1 cm. Volume calculations introduced additional errors of their 
own, but were still reasonable for ultrasound. Only four cases have been tried, 
and all are shown. The method worked in all cases. 



9 Conclusions 

We have described a new method for identifying anatomical structures using fun- 
damental properties of shape extracted statistically from populations of medial 
primitives, and have demonstrated its feasibility by applying it under challenging 
conditions. Further studies are presently underway to establish reliability over 
a range of data. Future directions include introducing greater specificity and 
adaptability in the boundary thresholding, incorporating more than 2 nodes 
into the model, introducing variability into the model to reflect normal variation 
and pathologic anatomy, extending the method to the spatio-temporal domain, 
and applying it to visualization. 
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Abstract. We describe a method for computing a continuous time es- 
timate of dynamic changes in tracer density using list mode PET data. 
The tracer density in each voxel is modeled as an inhomogeneous Pois- 
son process whose rate function can be represented using a cubic B-spline 
basis. An estimate of these rate functions is obtained by maximizing the 
likelihood of the arrival times of each detected photon pair over the con- 
trol vertices of the spline. By resorting the list mode data into a standard 
sinogram plus a “timogram” that retains the arrival times of each of the 
events, we are able to perform efficient computation that exploits the 
symmetry inherent in the ordered sinogram. The maximum likelihood 
estimator uses quadratic temporal and spatial smoothness penalties and 
an additional penalty term to enforce non-negativity. Corrections for 
scatter and randoms are described and the results of studies using sim- 
ulated and human data are included. 



1 Introduction 

Dynamic PET imaging usually involves the collection of a series of frames of 
sinogram data over contiguous time intervals that can range in duration from 
10 seconds to over 20 minutes. Data from each of the frames is independently 
reconstructed to form a set of images. These images can then be used to esti- 
mate physiological parameters |B|. This approach involves selection of the set of 
acquisition times, where one must choose between collecting longer scans with 
good counting statistics but poor temporal resolution, or shorter scans that are 
noisy but preserve temporal resolution. List mode data acquisition avoids this 
problem by allowing frame durations to be determined after acquisition. Alter- 
natively, the problem of temporal binning can be avoided entirely by directly 
using the arrival times in the list mode data to estimate a dynamic image. 

Snyder developed a list mode maximum likelihood (ML) method for es- 
timation of dynamic PET images using inhomogeneous Poisson processes. Each 
voxel has an associated time-varying tracer density that is modeled using basis 
functions that are based on assumptions about the physiological processes gener- 
ating the data, e.g. blood activity curves convolved with a basis of exponentials. 

A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. 98- 11111 1999. 

@ Springer- Verlag Berlin Heidelberg 1999 
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The observed list mode PET data are then inhomogeneous Poisson processes 
whose rate functions are linear combinations of the dynamic voxel tracer den- 
sities. Here we follow a similar approach but instead work with rate functions 
formed as a linear combination of known basis functions. Not only does the lin- 
earity of the model lend itself to efficient computation of the estimates, but also 
we can better represent the dynamic activity seen in experimental data that is 
not well modeled by the more restrictive physiological models. 

A second advantage of using list mode data arises in cases where the number 
of detected photon pairs in a particular study is far less than the total number 
of detector pairs. This is often the case in modern 3D PET systems which can 
have in excess of 10® sinogram elements in a single frame. To reduce this num- 
ber to manageable proportions, the data are often rebinned by adding nearby 
elements together. Alternatively, the raw list mode data case be stored and the 
need for rebinning is avoided. Barrett et al. CCH! describe a list mode maxi- 
mum likelihood method for estimation of a temporally stationary image. While 
this method will often reduce storage costs and avoid the need for rebinning, 
the random spatial ordering of the detected events in the list mode data does 
not lend itself to fast forward and backprojection and exploitation of the many 
symmetries in 3D projection matrices mm- To avoid this problem we use a 
hybrid combination of the standard sinogram and list mode formats that allows 
the reconstruction algorithm to exploit the same matrix symmetries used in our 
static imaging work m- All events in a dynamic study are collected into a single 
standard sinogram; this is then augmented by a “timogram” which contains the 
arrival times of each event stored so that they are indexed using the values in 
the associated sinogram. 

In this paper we present a method for reconstructing a continuous time es- 
timate of a dynamic PET image using list mode data and the theory of inho- 
mogeneous Poisson processes. A general B-spline model is used to represent the 
dynamic activity in each voxel so that the dynamic image is parameterized by 
a sequence of control vertex “images” where the control vertices are the coef- 
ficients for the spline basis. Tomographic projections of these control vertices 
produce the control vertices for the rate functions of the inhomogeneous Pois- 
son processes representing coincidence detections between each detector pair. 
A maximum likelihood estimate of the control vertices for each voxel can then 
be computed using the standard likelihood function for inhomogeneous Poisson 
processes The final result is a temporally continuous representation of 

the PET image that utilizes the temporal resolution of list mode data. 

Our parameterization of the inhomogeneous Poisson rate function is appli- 
cable to any linear combination of basis functions. This form encompasses the 
parametric imaging work of Matthews m, Snyder jzn and mixture models of 
O’Sullivan [Ej. We also note that Ollinger [IS| used list mode data to recon- 
struct rate functions as histograms with adaptive bin- widths; our work could be 
viewed as a continuous-time extension of this. For this paper we consider only 
cubic B-splines. The key advantage of B-splines are that they have systematic 
compact support. In particular, for any point on a cubic spline only 4 basis func- 
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tions are nonzero. Also, simple closed forms exist for all derivatives and integrals 
of a polynomial spline. 

Since inhomogeneous Poisson rate functions are unnormalized densities, we 
note that the density estimation literature using splines is closely related to 
our work (e.g. M). The standard methods involve exponentiated splines or 
squared splines. While these implicitly constrain the rate function to be positive, 
they cannot be represented with a linear basis. As there are substantial compu- 
tational savings to having a common basis for all voxels and projections, we did 
not pursue these approaches. 

The paper is organized as follows. We describe the model and maximum like- 
lihood method in Sections 0 and 01 respectively. Methods for selecting the spline 
knot points and methods for randoms and scatter correction are included in Sec- 
tion 0 Computational considerations including resorting data into a timogram 
format and the details of the algorithm used for computing the ML estimate are 
given in Section 01 In Section 0 we demonstrate the performance of the method 
with some preliminary simulation and experimental results. 



2 Dynamic Modeling Using Inhomogeneons Poisson 
Processes 

We model the positron emissions from each voxel in the volume as an inhomo- 
geneous Poisson process. The rate function for the voxel represents, to within a 
scalar calibration factor, the time varying PET tracer density. We parameterize 
the rate functions using a cubic B-spline basis: 

= Vj{t) > OVt, 

e 



where ?7j(-) is the rate function for voxel j, Wji is the £th basis weight (control 
vertex) for voxel j, and Bi(t) is the ^th spline basis function. The problem of 
reconstructing the dynamic PET image is then reduced to estimating the control 
vertices for each voxel. 

We denote by pij the probability of detecting at detector pair i a photon 
pair produced by emission of a positron from voxel j. The probabilities pij 
are identical to those used in static PET imaging. Here we use the factored 
matrix forms developed in HH]. Assuming that the detection probabilities are 
independent and time invariant, it follows that coincidence detection at detector 
pair i is also an inhomogeneous Poisson process with rate function 



Be{t) 



( 1 ) 



where the right-most term demonstrates that the rate functions for the data are 
also B-splines. 
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The Poisson process observed at the detectors is corrupted by random and 
scatter components that can also be modeled as inhomogeneous Poisson pro- 
cesses. Combining the three components, we have the model: 

+ D(t) + Si{t) 

where ri(-) and Si(-) are the randoms and scatter rate functions for detector pair 
i and A* (t) is the rate function for the process actually observed at detector pair 
i. In estimating the rate function parameters Wji we will assume that the rate 
functions for the random and scatter components have been determined through 
a calibration procedure and can be treated as known processes. 

For a Poisson process with rate function \{t), with N events observed from 
time Tq to Ti and event arrival times Oi, . . . , a^, . . . , ajv, the likelihood function 
is P 2 ] 



P(oi,... ,aAr|A(t)) 




exp 



/-Ti 1 

/ \{u)du > . 



(2) 



For N = 0, the product is defined as unity. 

For the set of independent events recorded in the list mode data the log 
likelihood is therefore given by 

L{V\W) = EE logA*(a*fe)-^ J X*{u)du, s.t. A*(t) > OVt. (3) 

i k i 



where V denotes the list mode data and W the set of parameters for the rate 
functions. We represent the data as T> = {x,ai, . . ,,ai, . . ,,aj), where x = 
{x\, . . ,,Xi, . . ., xi) are the sinogram count data, and a,; = {an , . . ., Uik, ■ ■ •, a^i), 
the set of Xi event arrival times at detector pair i. For the B-spline basis, 
yV = {wjg\i = = 1, . . ., J) are the set of basis coefficients. While x 

is a function of a and hence redundant, we use the sinogram counts to index the 
arrival times, as described in section t^. 11 



3 Penalized Maximum Likelihood Estimation 

We estimate the image control vertex values that define our dynamic image using 
penalized maximum likelihood. The objective function of the statistical model 
is modified with three regularizing terms 

L*{V\W) = L{W\V) - ap{W) - l3(j){W) - -iv{W). (4) 

The terms p{W) and regularize temporal and spatial roughness, respec- 

tively; iy{Vy) penalizes negativity of the image rate functions; a, P and 7 are the 
tuning parameters. We now describe each of these terms. 

We employ a temporal smoothing term to control the roughness of the spline 
rate functions 0. The form of the roughness penalty is the integrated squared 
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curvature. For voxel j this is 




Fortunately, for cubic splines this quantity has a simple expression, a quadratic 
form of the control vertices (0, pg. 238). We denote the symmetric, banded 
matrix of this quadratic form Q. Thus the temporal roughness penalty is given 

by 

p(w) = EEE 

j (-2 

We regularize the estimates of the control vertices using a spatial smooth- 
ing function equivalent to the pair-wise quadratic penalty used previously in 
penalized ML ^ and Bayesian estimation m of static PET images: 

EE E 1^33'i.Wje - 

I j 

where JVj denotes a set of neighbors of voxel j and Kjj' is the reciprocal of the 
Euclidean distance between voxel j and j' . Other possible choices of the penalty 
function include the discrete approximation of the thin plate spline bending 
energy H2] or a non-quadratic edge preserving function such as that described 

in p|. 

We now justify applying the same regularization to the control vertices as has 
previously been applied to images. First, the spline basis is the same for all voxels, 
so the control vertices have the same meaning for all voxels. Second, each member 
of the spline basis has limited support so that the effect of spatial smoothing 
is localized in time. Lastly, the B-spline basis we use is well conditioned P|, 
meaning that small changes in the control vertices produce small changes in the 
spline function. Hence if we expect two rate functions to be similar, then it is 
sufficient to constrain their control vertices to be similar. 

The optimization method must account for the non-negativity of the image 
rate functions r]j (t) . We use unconstrained optimization with a penalty function 
m- The problem is complicated somewhat in that the control vertices them- 
selves are not necessarily non-negative; instead we need to ensure that the cor- 
responding spline does not become negative. The local extrema of a cubic spline 
have a closed form, so we initially tried to penalizing negative local minima. 
This approach complicated the gradient and Hessian and made their evaluation 
prohibitively slow. 

Instead we simply penalize negative values computed at a finite number of 
time points. The vector z contains the locations at which we enforce positivity. 
It is constructed by uniformly spacing points in each inter-knot interval. Any 
elements of z for which the spline is negative are penalized with the square of 
the spline value, resulting in the penalty: 

•'(w) = y.Y. min I 0, E 

j m \ i 
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This approach does not necessarily ensure that the spline is non-negative 
everywhere. However, we have found that when used in combination with the 
temporal roughness penalty, the resulting estimates do not become negative, 
except possibly in the intervals just preceding a large increase in activity. 

It is straightforward to show that each of the four terms in the penalized 
likelihood @ have negative semi-definite Hessian matrices. Their null spaces 
only intersect at the zero vector. Therefore, the objective function is strictly 
concave and has a unique global maximum which can be found by a gradient- 
based search algorithm. 

4 Calibration Procedures 

4.1 Selection of Knot Spacing 

Before proceeding to the estimation we must decide on the spacing between 
knots in the B-spline basis. A cubic B-spline basis is defined by knot locations, 
u = (iti,... ,UL+i), where L > 4 is the number of basis elements and the 
first and last 4 knots are identical, to allow discontinuity at the end points. 
Uniformly spaced knots will not be efficient for most tracer studies since early 
changes in concentration have much greater magnitude than those later in the 
study. While we do not attempt to adaptively place the knots, in a modest 
attempt to optimize knot placement, we use the head curve to define knots that 
produce approximately equal arc lengths, as suggested in 0. The head curve is 
a temporal histogram using all of the list mode data and it serves as an estimate 
of the average rate function. Once the knot locations are determined, the actual 
basis functions are computed using the recurrence relations as described in m- 

4.2 Randoms and Scatter Rate Functions 

To apply the penalized likelihood estimation procedure described above, we 
should first apply calibration procedures to account for the presence of scat- 
tered and random events in the list mode data. We note that the simple ran- 
doms subtraction method that is used in static imaging is not applicable to list 
mode data. While neither randoms or scatter are included in the preliminary 
results presented here, they are described for completeness and will be essential 
in extracting accurate quantitative dynamic information from our results. 

The randoms rate varies approximately as the square of the true coincidence 
rate. We can model the randoms rate for each detector pair using an inhomoge- 
neous Poisson process: 



e 



where "/a, I = 1, . . . ,L are the control vertices for the randoms component in 
the ith line of response (LOR) . The list mode data produced on the EC AT HR-I- 
(CTI Systems, Knoxville Tennessee) contains both prompt (on-time) and delayed 
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events. We can use the delayed events to compute an ML estimate of these control 
vertices. The number of counts per LOR in the delayed events is typically quite 
small so that these estimates would probably exhibit high variance. However, 
after scaling for variations in individual detector sensitivities, there is a high 
degree of spatial smoothness in the mean randoms sinogram m Consequently 
we can use a penalized ML estimate in which substantial spatial smoothing is 
used to regularize the estimator. By choosing the knot spacing for the randoms 
rate functions to be at the same locations as for the image rate functions, the 
separate treatment of randoms in the estimation algorithm below produces little 
increase in the computational cost. 

The spatio-temporal scatter distribution is a function of both the dynamic 
tracer distribution and the object. We assume no interaction between the tem- 
poral and spatial distribution and scale a fixed spatial scatter estimate over 
time. While this is a rather crude approximation, we anticipate that it will be 
reasonably accurate due to the very smooth nature of the scatter contribution 
to the sinogram. However, for certain ligand studies of the brain, where the 
tracer eventually binds solely to subcortical structures, this approximation may 
perform poorly. 

Integrating the coincidence detections over time yields a sinogram from which 
we estimate the spatial scatter distribution using the simulation method in |2n|’ 
Let Si denote the estimated scatter contribution at the ith LOR. Next we cal- 
culate a least-squares spline estimate of the head curve using the same B-spline 
basis of the dynamic study; we normalize this spline to integrate to unity. Denote 
this estimate as h{t) = where hg are the control vertices of the head 

curve spline fit. The estimated scatter rate function is then 

Si{t) = Sih{t) = Si^ hgBgit). 

t 



Note that when computing Si and h(t) we subtract the delayed events from the 
prompts to correct for randoms. 



5 Computational Considerations and Image Estimation 

5.1 The “Timogram” 

The raw list mode data is in a form that is inconvenient for computing the 
gradient of the penalized likelihood function. The list mode events arrive in 
random spatial order and hence require random rather than sequential access to 
the control vertices that define the rate functions in the sinogram domain. We 
have therefore developed a means to store list mode data in sinogram form while 
preserving the temporal information. This is achieved using a single standard 
sinogram that contains all detected events augmented by a second file listing 
the arrival times of all events sorted in projection order. We call this second 
file the “timogram” . The timogram simply consists of the arrival times of each 
event. The sinogram is required to indicate how many arrival times to read for 
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each bin. The resulting pair of files can be substantially smaller than either the 
original list mode data file or the set of sinograms that would be stored in a 
conventional dynamic study. We note that Ollinger m also resorted list mode 
data prior to reconstruction, though his format did not completely eliminate the 
random spatial order. 

ECAT HR+ list mode data consists of a sequence of d-byte event words, 
each either a coincident event or a timing event. The coincident events record 
the sinogram bin, optional gating information, and are identified as “prompt” 
or “delay” . The timing events are inserted in the list mode stream every 1 mil- 
lisecond, and they also record time with a 27 bit integer. By re-encoding the 
arrival time of each coincidence event using 16 bits, we can retain a temporal 
resolution of 256ms and a maximum acquisition time of 4.6 hours. Using this 
format we need only 2 bytes per event in the timogram. Thus we can discard all 
of the timing events in the list mode data and save a factor of two in the space 
required to store the remaining coincidence arrival times. The space savings from 
discarding the timing events are significant. For example, in a 90 minute scan, 
the timing events take more space than a 3D sinogram set and hence the raw 
list mode data will always take more space than the sinogram-timogram, even 
if no coincidences are detected! 

The sinogram-timogram format will also be more space efficient than a multi- 
frame sinogram when the space required to store the event arrival times in the 
timogram is less than the 2nd through nth sinograms. For example, an 11 frame 
acquisition is 10 frames larger (~ 200MB larger) than a sinogram-timogram with 
no events; only after 200MB-worth of events, or 100 million counts are stored 
will the sinogram-timogram be less space efficient. 

The sinogram-timogram format could be made even more compact by stor- 
ing inter-arrival times and then performing entropy-based compression 0 . The 
motivation for this is that LOR’s with high activity will tend to have short inter- 
arrival times, hence will have many high bits consistently zero, a property that 
compression can exploit. 

5.2 Preconditioned Conjngate Gradient Based Reconstrnction 

A preconditioned conjugate-gradient method was used to maximize the objec- 
tive function. The particular method closely follows our previous work on static 
reconstructions gSTEI, so we only describe the method briefly here. We use the 
following preconditioned Polak-Ribiere form of the conjugate gradient method. 
yp(n+l) _ yy(n) _|_ q,(") g{n) 

g(n) ^ ^(«) 

d(”) = 

where is the gradient vector of the penalized likelihood 0) at W = 

is a preconditioner, and the step size is found using a Newton-Raphson 
line search. 
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In this study was chosen analogously to the static PET reconstruction 
as 



= diag 



I I 1 

j’ 



where ^ is a small positive number to ensure that is positive definite. Here 
we set 5 equal to 0.01 max^/lwj"^}. 

The algorithm was initialized with a constant image for which the forward 
projected rate function matches the average rate of the data after subtracting 
scatters and randoms. The search vector is initialized by setting 
At each iteration we test whether the search vector is an ascent direction, i.e 
g{n) g(n) ^ then we reinitialize the PCG algorithm with 

The logarithm in the likelihood function requires that the line search in o 
is performed with the hard constraint that the forward projected rate function 
at any arrival time is non-negative, i.e. 



^ 0, V?, /c. 

The negativity penalty in (0) is soft allowing small negative values. The hard 
constraint can be satisfied by limiting the step size in the update step of the 
conjugate gradient algorithm. To minimize the effect of this constraint on the 
convergence rate, we use a bent, rather than truncated, line search HH. 



6 Simulation Studies and Performance Evaluation 

6.1 Simulation Study 

We evaluated our method with simulated and real data. We simulated a blood 
flow data set using a single slice of the Hoffman brain phantom. We evaluated the 
simulated data on the basis of instantaneous rate accuracy as described below. 
The real data consisted of one 2D subset of a 10 minute 3D water list mode 
brain study. Our subjective evaluations focused on tissues that are known to 
have distinctly different dynamics with this tracer. 

The simulated data was a simplified model of the dynamics of a bolus in- 
jection of ^®0-water using tissue time activity curves generated by the Kety 
autoradiographic model (0, Figure 3B). We chose two extreme curves, one cor- 
responding to very high blood flow, one to very low blood flow. White matter 
voxels were assigned to have low blood flow, gray matter voxels to have high 
blood flow. We used an 11 element B-spline basis with support from 0 to 140 
seconds; the spacing of the knot locations were determined by equally spacing 

7 points along a medium blood flow curve. We used 7 negativity penalty points 
(dz) in each knot interval. Approximately 5 million counts were generated for 
this data set. 

As a preliminary evaluation we computed the mean squared error (MSE) 
between the true source and the instantaneous rate estimate at three times, t = 
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10, 23 and 60 seconds. We compared this MSE to that obtained by estimating 
the instantaneous rate with a static sinogram based on events arriving in the 
interval [t — d/2,t + d/2], for d = 1, 2, 4, 10, and 20 seconds. In both cases 
the MSE’s are based on one realization, the mean taken over voxels. While 
this comparison of instantaneous rate accuracy could be regarded as an unfair 
since the static data has no information on the nonstationarity of the tracer 
distribution, it is comparable to existing methods. We did not attempt to match 
the spatial smoothness (bias) of the two methods; for each d, the static data sets 
were reconstructed with an ML estimate {(3 = 0) with 25 iterations. 

Figure Q] shows the results of the blood flow simulation for 120 iterations. 
The rate functions for six voxels are shown in top left; there is generally good 
agreement between true and estimated functions. The plot of instantaneous mean 
squared error is shown top right. The spline estimates (horizontal lines) have 
appreciably lower MSE than all static estimates with frame durations less than 
10 seconds. Note that of the 20 second static estimates, the one with largest 
MSE occurs at t = 23 seconds, corresponding to the mode of the high-flow 
curve. This is expected since averaging across greater durations from the mode 
will bias the static estimate downward; at the other time points the rate function 
is approximately linear and there will be less bias in the static estimate. 

6.2 Human Study 

For the real data we used a 15 element B-spline basis with support over the 
whole acquisition duration, 0 to 600 seconds; knot spacing was determined by 
approximate equal spacing of 11 points along the head curve; again dz = 7. 
The subject was injected with a 5 mCi (~ 200 MBq) bolus of ^^0-water ap- 
proximately 30 seconds after the start of 3D data acquisition. To create the 2D 
data set we rebinned data from eight ring pairs into a single dataset with about 
400,000 counts, using only the prompt events. 

Figure El shows the results of the human study after 40 iterations. A three 
panel image shows the tracer distribution at 20, 60 and 120 seconds post injec- 
tion. At 20 seconds the carotid arteries are visible, especially the right one; at 60 
seconds the water has perfused the brain and surrounding tissue and the carotids 
are sill visible; at 120 seconds the carotids are indistinguishable from background 
tissue though the brain still has increased activity. This differing temporal char- 
acter is clear from the plot of selected voxels. The carotid artery shows a sharply 
peaked distribution, while brain tissue rises later and more smoothly; the sinus 
region has much lower flow though it’s rate function shows a similar character 
to that of the brain tissue. 
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Spline Estimate for t=23 sec 



Truth for t=23 sec 



20 Second Static Estimate for t=23 sec 




Fig. 1. This figure shows the results of the blood flow simulation. The top left plot 
shows the estimated {dashed) and true (solid) rate functions; the vertical lines (dotted) 
indicate knot locations. The top right plot shows the mean squared error over the image 
for estimating the instantaneous rate at 3 time points; the horizontal lines are for the 
spline estimate, the decreasing curves show mean squared error for the static estimates 
of different frame lengths. The bottom two rows show instantaneous rate images; the 
top row is for t=23 seconds, the bottom row for t=60 seconds. The left column is the 
spline estimate, the center column is the truth and the right column is the estimate 
from the longest static acquisition; the truth images have circles noting the location of 
the voxels plotted top left 
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Fig. 2. This figure shows the result of our method using real data. The top row shows 
a sequence of instantaneous rate images for 20, 60 and 120 seconds post-injection. 
(Injection occurred at t « 30 seconds). The early arrival and fast clearance of the 
tracer in the carotid artery is apparent, as the carotid is visible in the left and center 
images, but not in the right. The bottom right shows the estimated rate functions for 4 
individual voxels; the vertical dotted lines are the knot-locations. The image on left is 
the total counts image; circles indicate the location of the 4 voxels plotted; the bilateral 
circles mark the right and left cerebellum, the circle outside of brain tissue is the sinus 
and the other circle is the right carotid artery 
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7 Discussion and Conclusions 

We have presented preliminary results on estimating continuous time dynamic 
PET images from list mode PET data. We modeled the dynamic tracer den- 
sity as an inhomogeneous Poisson process and parameterized the rate functions 
with a B-spline basis. We introduced the timogram as a means to compactly 
represent the temporal information of list mode data. The B-spline basis and 
the timogram’s spatial ordering both contribute to an efficient implementation 
that makes the creation of continuous time reconstructions feasible. 

We have presented basic performance analysis with arbitrarily chosen tun- 
ing parameters for spatial and temporal regularization. While these results are 
encouraging in general, Monte Carlo simulations are needed to assess bias and 
variance in ROI’s and in the image at different parameter values. 

Estimating images of physiological parameters is a possible extension of this 
work. This could be accomplished either through embedding the physiological 
model in the rate function (as in m) or estimating parameters with the spline 
functions. The standard to compare these results to would be estimates from the 
temporally binned data. In fact, temporally binned data could also be applied 
in this inhomogeneous Poisson framework, as there is a counterpart to equation 
for binned data 1221. 
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Abstract. In this paper, we propose extensions to a powerful geomet- 
ric shape modeling scheme introduced in M- The extension allows the 
model to automatically cope with topological changes and for the first 
time, introduces the concept of a global shape into geometric/geodesic 
snake models. The ability to characterize global shape of an object using 
very few parameters facilitates shape learning and recognition. In this 
new modeling scheme, object shapes are represented using a parame- 
terized function - called the generator - which accounts for the global 
shape of an object and the pedal curve/surface of this global shape with 
respect to a geometric snake to represent any local detail. Traditionally, 
pedal curves/surfaces are defined as the loci of the feet of perpendiculars 
to the tangents of the generator from a fixed point called the pedal point. 
We introduce physics-based control for shaping these geometric models 
by using distinct pedal points - lying on a snake - for each point on 
the generator. The model dubbed as a “snake pedal” allows for interac- 
tive manipulation via forces applied to the snake. Automatic topological 
changes of the model may be achieved by implementing the geometric 
active contour in a level-set framework. We demonstrate the applicability 
of this modeling scheme via examples of shape estimation from a variety 
of medical image data. 



1 Introduction 

Extracting shapes of anatomical structures from medical image data is a chal- 
lenging problem in Medical Image Analysis and has been the focus of research 
of numerous researchers in the medical imaging community over the past sev- 
eral years. Since the inception of active contours/surfaces a.k.a. snakes, in the 
vision/graphics community by Kass et al. |5|, these elastically deformable con- 
tours/surfaces have been widely used for a variety of applications including Med- 
ical Image Analysis where it has facilitated boundary detection and represen- 
tation, motion tracking etc. of anatomical structures of interest. The classical 
approach to object shape recovery using the snakes is based on deforming an 
initial configuration of the snake represented by a position vector Vq towards the 

A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. 1 1 2- 112^ 1999. 
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boundary of the shape being detected by minimizing a functional that can be 
regarded as the bending energy stored in a thin flexible beam/rod or a stretch- 
able string subject to loading. There are several problems associated with this 
approach, such as initialization, the automatic specification of the physical or 
elasticity parameters etc. Moreover, this energy model requires that the topol- 
ogy of the shape to be estimated be known a priori. Several researchers have 
addressed these issues in detail and some of them are open research issues to 
date. 



A viable alternative to the snakes model was proposed by Malladi et al. 0 
and Caselles et al. p. These models are based on the theory of curve evolu- 
tion and geometric flows. Automatic changes in topology can be handled in a 
natural way in this modeling technique, by implementing the curve evolution 
using the level-set embedding schemes. A generalization of this model was later 
proposed simultaneously by Caselles et. al., |2| and Kichenassamy et. al., |0|. 
The generalization also known as the geometric active contours showed the link 
between the Kass et. al., jS] snakes and the geometric active contours a.k.a. 
geodesic/ geometric snakes. For details on the theory of curve/surface evolu- 
tion and its level-set implementation, we refer the reader to j1 1819121611 211 311^ . 
Geoemtric active contours and its variants are quite successful in recovering 
shapes from medical as well as non medical images. They do not suffer from 
the initialization problems, do not have too many user specified parameters and 
can handle arbitrary topologies in an elegant manner. One might ask, what then 
is lacking in these models? Firstly, there is no way to characterize the global 
shape of an object or anatomical structure, which is a useful property to have in 
describing shape for identification purposes. Secondly, it is not easy to incorpo- 
rate prior shape information. In this paper, we will address these problems and 
propose a novel modeling scheme along with efficient numerical techniques for 
use in shape recovery from image data. 



1.1 Overview of the Hybrid Geometric Active Contour Model 

In many Medical Imaging applications such as shape recognition, character- 
izing the global shape of an object is crucial. Traditional geometric active con- 
tour/surface models do not possess the capability to characterize the global shape 
of an object. In this paper, we introduce a novel concept of a global/core model 
into the PDE-based curve evolution framework by embedding the snake pedal 
model into a level-set framework. Instead of characterizing a shape boundary by 
the position of every point on the boundary, our proposed model, referred to as 
the hybrid geometric active model, now describes a shape as a combination of a 
global/core shape such as an ellipse, super-ellipse, etc. and a variable offset de- 
fined with respect to the global shape. The variable offsets are controlled by this 
global shape and an evolving curve — the controlling snake in the snake pedal 
model. For the model to recover the object boundary, we introduce a reliable 
and efficient numerical method which consist of a global plus local shape esti- 
mation technique in the model fitting. We use the Levenberg-Marquardt (LM) 
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for estimating the global shape and a combination of up-wind and minmod fi- 
nite difference schemes in a level-set framework for estimating the local shape. 
The hybrid geometric active contour/surface model retains all the advantages 
of traditional geometric active models (for example, topology change, ability to 
model complex geometries and amenability to stable numerical implementation) 
and has the added ability/advantage of being able to compactly represent global 
shape of an object. Augmentating the curve evolution framework with a global 
shape/core will be very useful in shape learning/recognition and image indexing 
applications. 



1.2 Organization of the Paper 

In Section o we briefly discuss the snake pedal model introduced in m and 
then present the novel hybrid geometric active models in Section f2.2l The numer- 
ical issues of the model fitting process will be discussed in Section E3 followed 
by the implementation results in Section 0 

2 Hybrid Geometric Active Model 

In this section we will first briefly review the snake pedal model - introduced in 
m - with the aid of 3D model fitting examples. We will then present the hybrid 
geometric active model which is obtained by replacing the snake in the snake 
pedal with a geometric/geodesic snake implemented in a level-set framework. 



2.1 The Snake Pedal Model 

Let a be a planar curve, the pedal curve of a is defined as the locus of 
points on the foot f of the perpendicular from a fixed point p called the pedal 
point to a variable tangent of a. Let j3 be the pedal curve of a. with respect 




peda] point p 



Fig. 1. f is on the pedal curve of a. with respect to the pedal point p 



to the pedal point p, and let a(t) = g,/3 (t) = f, as shown in the Fig. [D The 
projection of a (t) — p in the direction Ja (t) must be (3 (t) — p, where a (t) is 
the tangent line of the plane curve a (t), J : 3?^ — > 3?^ is a linear map given by 
J{pi,P 2 ) = {—p 2 ,pi)- J can be geometrically interpreted as a rotation by 7 t/ 2 
in a counterclockwise direction. We can thus define a pedal curve as follows j2|: 
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Definition 1. The pedal curve of a regular curve a : (c, d) — s- 3?^ with respect 
to a fixed (pedal) point p G 3?^ is given by 

pedal [p, a] (t) = p + ^*^ Ja\t). (1) 

In Fig.0 we present examples depicting the pedal curves of an ellipse for different 
positions of the pedal point (shown by a bold dot). Note that the pedal curve is 
capable of exhibiting local as well as a global deformations and the location of 
the local deformation is in the locality of the pedal point. By moving the position 
of the pedal point, it is possible to synthesize a variety of local deformations as 
depicted in the Fig.|3 The curve a{t) will be referred to as the generator for 
the pedal curve (d (t) and process of generating a pedal curve will be referred to 
as the pedaling operation. 



(a) (b) (c) (d) 



Fig. 2. Examples of pedal curves of an ellipse for different pedal point positions. Pedal 
points are shown by a dot in each case 



More general shapes may be synthesized by letting the pedal point be dif- 
ferent for each point of the generator. We can let the pedal points be specified 
by another curve p(t) represented by a standard snake |n| and then apply the 
pedaling operation to each point on the generator a.i = a (ti) with respect 
to corresponding pedal point p^ = p(ti)- The generator can be either a param- 
eterized or an implicit function representing a curve. The pedaling operation 
generates a new curve that we dub a snake pedal x(t) as shown in Fig. 0 
If the generator is an ellipse as shown in Fig. 0 (a), we can represent it in a 
parametric form by 
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where ai ,02 are aspect ratio parameters, 0 is the rotation angle between the 
intrinsic (material) coordinates and inertial coordinates, m = (mi, 7712 )^ is the 
centroid of the generator in the world coordinates. We collect the generator 
parameters into the global parameter vector q = (a, 6, 9, mi, m 2 )^. 
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snake pedal 



snake 



(a) 



(b) 



Fig. 3. (a) The process of generating a snake pedal with an ellipse generator, (b) “snake 
pedal” controlled by the snake using an ellipse generator 



In Fig. Ha)-(b) we depict some examples of snake pedals, curves generated 
using snakes and an ellipse as the generator. Note the variety of local defor- 
mations that can be generated using this modeling technique. We remind the 
reader that the snake pedal itself is a geometric model and that it is not directly 
responsive to the application of external forces unlike the standard snake models 



The pedal curve definition can be modified slightly by subtracting the second 
term from the first in (^. This allows for larger local deformations including 
shrinkage and expansion. A pedal surface is the surface analog of the pedal 
curve. It is the locus of the points on the foot of the perpendicular from a fixed 
pedal point to a variable tangent plane of the surface. As in the 2D case, we can 
let the pedal point vary for each point on the generator surface. Thus we have 



0 



9 




Fig. 4. Examples of “snake pedals” using an ellipse generator 
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the snake pedal surfaces in 3D whose shape can be controlled by snakes which 
are either curves or surfaces in 3D m 

Fitting the “snake pedal” to data is posed as a nonlinear minimization and 
we use the Levenberg-Marquardt nonlinear optimization algorithm m in con- 
junction with an efficient version of the alternating direction implicit (ADI) 
technique m to achieve the fitting. Fig. O presents a model fitting example to 
sparse 3D data points placed by an expert neunro-scientist along the boundaries 
of a gyrus in selected slices of an MR brain scan. Such a scenario arises in the 
semi-automatic construction of anatomical models for possible use as a prior 
model in shape recovery from unknown data. In this example, from left to right, 
the images depict a slice of an MR brain scan in which the shape of interest - 
a gyrus - has been identified by a neuro-scientist via sparsely placed points on 
the shape boundary. The next image shows the collection of these 3D points in 
red and the initialized snake pedal model followed by an image depicting the 
intermediate stage of fitting and the final fitted model respectively. As evident, 
the model achieves a visually accurate fit to the data. In addition, the model fit 
has been validated against manual segmentation from an expert neuro-scientist. 




(a) (b) (c) (d) 



Fig. 5. Left to right: MR brain scan depicting a region of interest (a gyrus), initialized 
model, intermediate fitting stage and final model fit 



2.2 Evolving Snake Pedals 

As described in the earlier sections, the traditional PDE-based geometric curve 
evolution algorithms do not provide a mechanism to characterize the global 
shape of an object. In this section, we describe how the level set formulation 
of the geometric curve evolution can be applied to our snake pedal model to 
realize topological changes (when necessary) as well as capture the global shape 
representation of an object. 

In our approach, to incorporate a smoothing constraint on the snake pedal, we 
impose regularization via Euclidean arc-length minimization of the snake pedal. 
This leads to the standard geodesic/geometric active contour. Let us consider 
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a snake pedal denoted by Ve{t), with its position denoted by Pe = {Pei,Pe 2 }, 
then the standard geometric curve evolution formula for the snake pedal can be 
written as 



^ = i^(fce)Ne, (3) 

where ke and Ng are the curvature and normal of the snake pedal respectively. 
We now examine how the “snake” would evolve if the snake pedal were evolv- 
ing as a function of its local curvature. Indeed, we will not evolve the snake 
pedal curve directly, instead, we will first solve for the snake position under the 
constraint that the arc-length of the snake pedal is minimized, and the pedal 
curve can then be determined by the pedaling operation defined in Section 1^1 
given the position of the generator. The problem therefore can be solved by the 
following procedure: 

1. Derive the curve evolution equation for the snake V{t) by minimizing the 
arc-length of the snake pedal Ve{t)-, 

2. Embed the evolving curve V{t) in a higher dimensional surface (j)i and for- 
mulate the equation of motion for (f>i; 

3. Solve the equation of motion for (j)i using proper numerical techniques; 

4. Determine the snake pedal curve from the evolving snake via the pedaling 
operation, given the generator. 




Fig. 6. Relation between V (t) ,'Pe{t),<j>i, and , the level-sets of the higher dimensional 
surfaces 4>i and <j >2 respectively. They are related by the pedaling operation 



We remind the reader that in the above procedure, the snake pedal curve 
also evolves, but it evolves as a standard geometric active contour embedded in 
another higher dimensional surface (f) 2 - We do not need to solve the equation 
of motion for <j )2 directly instead, (j )2 can be regarded as being implicit in the 
procedure. Thus, we do not determine the level-set curve Ve{t) of the surface 
4>2 instead, we first evaluate (f>i and determine its zero set, then we apply the 
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pedaling operation to obtain the snake pedal Veit). In addition, even though <^2 
is not involved in the final algorithm, it is very useful in the derivation of the 
governing PDE for the function cj)i. Note that the snake model discussed here 
is no longer the classical energy-minimizing model, since its deformation is not 
obtained by minimizing some internal deformation energy, but we will still use 
the name “snake” to represent the controlling active contours in the snake pedal 
model. Fig. 0 depicts this important relationship between V{t),Ve{t),(j)i, and 

4 > 2 - 

Note that an important feature of this modeling technique is the incorporation 
of a global parameterized shape, namely, the generator, into the curve evolution 
framework. As already mentioned earlier, this global parameterized shape can 
be very useful in applications involving objects recognition as well as in shape 
learning (by collecting statistics on the global shape parameters of the model). 
Since a geometric active contour is used to represent the controlling “snake” in 
the model, and also the model can capture the “global orientation” of an object 
or a group of objects, this snake pedal model is referred to as a hybrid geometric 
active contour model. Detailed discussion of the related concepts will be given in 
the following sections, starting with the derivation of the relationship between 
the snake and the snake pedal evolution. 



Relation Between the Snake and Snake Pedal Curve Evolution Con- 
sider a snake p = (pi,p2)^ and a generator a = (01,02)^ with normal J^, = 
(ni, n2)^, where p and ct are related by an association scheme - a radial associ- 
ation produced by using the same parameterization for both the curves 0 . We 
obtain the corresponding snake pedal Pe = {pei,Pe2)'^ via the following pedaling 
operation: 



Pe = P - 



(q - p) ■ ’ 

lUllP 



( 4 ) 



or simply. 



where 



Pe = Ja P - b. 



Ja 



1 I m ■ U 2 

^ 2 , 2 2,2 
Tl-^ “h TI2 “h 77-2 

■ 772 1 I ^2 

Tl-^ “h 772 ^2 



( 5 ) 



( 6 ) 



and 



u oi • ni -|- 02 ■ n.2 

— 2,2 

Hi + ri 2 



ni 

n 2 



( 7 ) 



Let the inverse of be denoted by Jb, which is also a function of ni and U2- 
We remind the reader that because the generator does not evolve over time. 
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the forms of Jb and b do not change over time, but their contents change only 
during the iterative estimation process. Thus, we can treat Jb as a constant 
matrix and b as a constant vector with respect to time when evolving p(t) or 
Pe(t). Therefore, ® can be explicitly written as a function of time: 

Pe{t) = 3 a p{t) - b. (8) 

Taking partial derivative on both sides of (jB|) with respect to time t yields 
dPeit) dp{t) dp{t) dpeit) 

Equation (0 reveals a very crucial relationship between the snake and snake 
pedal evolution: the evolution of two curves are linearly related by a Jacobian 
matrix. We remind the reader that the relationship between the snake and snake 
pedal is not linear, but their evolutions over time are related by a linear trans- 
formation. 



PDE for the Snake and Snake Pedal Evolution When embedding the 
snake V{t) as the zero set of a higher dimensional surface 4>i, we have: 

{r{t)e?a^:^{p,t) = o}, ( 10 ) 



Differentiating IllUt with respect to t yields: ^ = 

higher dimensional function (j) 2 , we have the following formula 
0. From the discussion in the previous section, we have 



0. Similarly, for the 

■ . ,9pe _ 



dpe 




9p 


dt 
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W’ 




or 




dp 




dpe 
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— J B 


~w 


to minimize 


the 



( 11 ) 



which amounts to imposing a smoothness constraint on the snake pedal curve, 
hence we require that: = F(fce)Ne, where F{ke) is the speed function, 

which depends on the curvature of the snake pedal ke ■ Ng is the unit normal of 
the snake pedal. Similar to the relation between the normal of the snake and its 

N = 



higher dimensional function 

following relation holds: Ng = — ||y 0 ^|| 
we obtain. 



ii„ .t II ■ For the zero level set of 62 , the 
Combining the above three equations 



^ = F(A:g)||V02||. (12) 

Equation dnj is the standard level-set evolution. Similarly, for the governing 
equation of the snake curve, we have: 



dA 

dt 



F{ke)Vcj)l 



^ V ()>2 



(13) 
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The above two equations are the equations of motion for and (t> 2 , respectively. 
We rewrite them together as follows: 



^ = F(fce)||V</>2|| 



(14) 



Eq. ill 41 represents the PDEs for the evolution of (j)i and cj >2 (note that they are 
not coupled) . One approach to solve these PDEs is to use a combination of central 
and upwind finite differences COl to solve the first equation in every iteration, 
and then solve the second equation using a similar method subsequently. We 
will discuss the up-wind finite difference method in Section This approach 
is straightforward, but needs large amounts of storage for both </>i and 4>2- Note 
that in m, only the gradients of (f>i and 4>2 are involved on the right hand sides 
of both equations, we propose a more elegant approach to solve the PDEs in 
(E) with much less storage, by employing the intrinsic relation between V<()i 
and V (()2 as discussed in P). In P], a surprisingly simple relation between (j)i and 
4>2 is obtained after a tedious derivation. This relationship is given by. 



V(()i = Ja • V(() 2 , or V02 = Js • V(/)i. (15) 



Where the Jacobian between the evolution of the snake and snake pedal curve 
3a and 3b = are defined as before. At first glance, this relation is seemingly 
contradictory to our intuition. One would think that since = Ja ^ ; if 

there is any Jacobian between the gradients of <f>i and </) 2 , the relation should be 
V(f >2 = JaV</>i, rather than V(/)i = JaV(/> 2 . 

Since our objective is to obtain the equation of motion for the higher dimen- 
sional surface </>i , we substitute </>2 with in did I to get 

^ = E(fce) ||JbV<(.i||. (16) 

The representation of ke in terms of (pi is quite complicated, we refer the reader 
to 0 for these details. Equation is the equation of motion for (pi and 
constitutes the primary equation in our application. When using the snake pedal 
model for recovery of the shape of interest from image data, we need to solve 
the more general equation of motion for the higher dimensional function (pi: 
^1 = g{x,y) F{ke) ||JsV(()i|| -I- W(pi ■ Wg, where g{x,y) is an image feature 
based function and is used to stop the curve evolution when the contour is close 
to the desired edges. \7(pi and Vg must be evaluated at locations on the snake 
and the snake pedal, respectively. 



2.3 Numerical Solution 

In this section, we discuss application of the snake pedal to recover the bound- 
aries of the shape of interest from an image using a novel global plus local shape 
estimation procedure. For the global shape estimation, we employ the well known 
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Levenberg-Marquardt (LM) method; while for the local shape estimation, we 
present a modified level-set method. 

In Section we illustrated that the solution for the snake pedal evolu- 
tion can be achieved by first solving the snake evolution, which is embedded 
in a higher dimensional surface (f>i, then applying the pedaling operation on 
the snake. Therefore, developing a reliable and efficient numerical algorithm for 
solving the governing equation of fi, i.e., (113 is the primary task in applying 
the snake pedal model for extracting shapes of interest from image data. The 
governing equation of motion for fi, in the simplest form is given in dH. 

As discussed in mu. the speed function F{.) in (j I dji consists two terms, 
namely, the advection term and the diffusion term Fq. The diffusion term 
smooths the curve while the advection term may result in sigularities during the 
curve evolution even with smooth initial data. A variety of entropy-satisfying 
algorithms have been proposed to evlove the curve beyond the formation of 
sigularities. 

In our numerical approach, to solve the equation of motion (uni, for the 
diffusion term, we use the standard central difference approximation. Whereas, 
for the advection term, we need to solve the following hyperbolic initial-value 
problem: 



</>it = \/ a{x, + b{x, y)4>\y + c{x, (17) 

where a{x,y),b{x,y) and c{x,y) are determined by the entries of Js and do 
not change over time. (j)\^ and <p\y can be approximated by the upwind finite 
difference scheme discussed in m- But for the 4>ix4>iy term, we use the minmod 
finite difference approximation discussed in Kimmel et al. Pj . The minmod finite 
derivative is defined as: 



minmod{a, b} 



sign{a)min{\a\, |5|) if > 0 
0 otherwise. 



(18) 



Using this definition to approximate fixfiy leads to 



(j>ix4>iy \ !c=iAx = minmod{Df rninmod{Dy 






(19) 



where Df , are as defined are 



d: 



Jli — 



A(i-i) 



Ax 



Dfcfu = 



A(»+i) ~ 
Ax 



(20) 



and similar definitions apply to and D~ . 

Combining these finite differences yields a first order numerical scheme for 
solving II I Yll . the advection term in CED: 



[aij{{max{Df(j)i^i^ij),Q)f -k (mm(D+</)q(ij),0))2) 

+h^j{{max{D- + (mfn(D+(()i_(ij),0))2) 

-Ci^jminmod{D+ > Df (t>i^(^ij))minmode{D+ )] 

( 21 ) 
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This numerical scheme is stable and provides a natural way to handle the topo- 
logical changes in the snake evolution. 

3 Implementation Results 

In this section, we present the level-set implementation of the hybrid geometric 
active model with ellipse and super-ellipse generators described in Section ^21 re- 
spectively. The examples contain 2D slices from an MR scan of the human heart. 
In each row of Fig. Q from left to right, images show the model initializations, 
intermediate stages of fitting and the final model fits. The model is initialized to 
capture the endocardium structure. In the top row, we used an ellipse generator 
and in the bottom row, a superellipse was used. AS evident from the results, the 
super-ellipse captures the global shape better for this example. The snake pedal 
is shown in green (light gray) and the global shape in red (dark gray). We use 
image-based speed function to deter the model evolution in both examples. 





Fig. 7. Hybrid geometric active contour fitting examples: Left to right, model initializa- 
tion (snake pedal in green (light gray) and generator in red (dark gray)), intermediate 
stages of evolution and final fit. The first row uses an ellipse generator while the second 
row uses a super-ellpise generator 



Fig. El depicts a topological change example for synthetic data. From left to 
right, images depict model initialization, intermediate stage of evolution and final 
fit respectively. In this example, the snake pedal is initialized as a single small 
ellipse, as the fitting proceeds, the model expands and splits, and finally fits to 
all the object contours in the whole image. The global shapes of the generator 
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is not shown here since the meaning of the “global” shape in these examples is 
not very useful. 

By replacing the ellipse generator with a super-ellipse, we can obtain a more 
general/powerful representation of the global shape in the snake pedal model. 

In all these examples, we implemented the level-set form of the equation: 

^ = g{x,y) F{ke) ||JbV(/)i|| -k V^i • Vg, 

where, gC^I) = 1/(1 + ||V(G * J)|p/i4T)with G * I being a Gaussian convolved 
with the image and K being a scaling constant. More sophisticated stopping 
criteria may be synthesized to yield better accuracy in shape recovery. We use 
the upwind difference and minmod difference method described in Section ESI 
to implement this Hamilton- Jacobi equation of motion. 




Fig. 8. Topological change examples with hybrid geometric active contour models: Left 
to right, initialization (snake pedal in yellow (white)), intermediate stages of evolution 
and final fit 



4 Conclusions 

In this paper, we proposed novel extensions to a powerful geometric shape mod- 
eling scheme called the snake pedals, introduced in m The extension involved 
methods for automatically coping with topological changes and for the first time, 
the introduction of the concept of a global shape into geometric/geodesic snake 
models. The ability to characterize global shape of an object using very few 
parameters facilitates shape learning and recognition. Unlike the deformable 
superquadrics, the geometric snake pedals have the ability to cope with large 
bending and twists in a shape without explicitly introducing parameters to char- 
acterize the same. This leads to reduced numerical complexity and increased nu- 
merical stability in the resulting shape recovery algorithms used. The modeling 
scheme was applied to recover shapes of interest from a variety of medical image 
data using numerically stable algorithms. 
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Abstract. Automatic and semi-automatic magnetic resonance angiog- 
raphy (MRA) segmentation techniques can potentially save radiologists 
large amounts of time required for manual segmentation and can facili- 
tate further data analysis. The proposed MRA segmentation method uses 
a mathematical modeling technique which is well-suited to the compli- 
cated curve-like structure of blood vessels. We dehne the segmentation 
task as an energy minimization over all 3D curves and use a level set 
method to search for a solution. Our approach is an extension of previ- 
ous level set segmentation techniques to higher co-dimension. 



1 Introduction 

The high-level goal of this research is to develop computer vision techniques 
for the segmentation of medical images. Automatic and semi-automatic vision 
techniques can potentially assist clinicians in this task, saving them much of the 
time required to manually segment large data sets. Specifically, we consider the 
segmentation of volumetric vasculature images, such as the magnetic resonance 
angiography (MRA) image pictured in Fig. 

As shown here, blood vessels appear in MRA images as bright curve-like 
patterns which may be noisy and have gaps. What is shown is a “maximum 
intensity projection”. The data is a stack of slices where most areas are dark, 
but vessels tend to be bright. This stack is collapsed into a single image for 
viewing by performing a projection through the stack that assigns to each pixel 
in the projection the brightest voxel over all slices. This image shows projections 
along three orthogonal axes. 

Thresholding is one possible approach to this segmentation problem and 
works adequately on the larger vessels. The problem arises in detecting the small 
vessels, and that is the objective of our work. Thresholding cannot be used for 
the small vessels for several reasons. The voxels may have an intensity that is 
a combination of the intensities of vessels and background if the vessel is only 
partially inside the voxel. This sampling artifact is called partial voluming. Other 
imaging conditions can cause some background areas to be as bright as other 

A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. 126-[|33 1999. 
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vessel areas, complicating threshold selection. Finally, the images are often noisy, 
and methods using local contextual information can be more robust. 

Our method uses the fact that the underlying structures in the image are 
indeed 3D curves and evolves an initial curve into the curves in the data (the 
vessels). In particular, we explore techniques based on the concept of mean cur- 
vature flow, or curve- shortening flow, from the field of differential geometry. 




Fig. 1. Maximum intensity projection of a phase-contrast MRA image of blood vessels 
in the brain 



2 Curvature Evolution Methods 

Mean curvature evolution schemes for segmentation, implemented with level set 
methods, have become an important approach in computer vision mm\ . This 
approach uses partial differential equations to control the evolution. An overview 
to the superset of techniques using related partial differential equations can be 
found in The fundamental concepts from mathematics from which mean 
curvature schemes derive were explored several years earlier when smooth closed 
curves in 2D were proven to shrink to a point under mean curvature motion 
m- Evans and Spruck and Chen, Giga, and Goto independently framed mean 
curvature flow of any hypersurface as a level set problem and proved existence, 
uniqueness, and stability of viscosity solutions m- For application to image 
segmentation, a vector field was induced on the embedding space, so that the 
evolution could be controlled by an image gradient field or other image data. The 
same results of existence, uniqueness, and stability of viscosity solutions were 
obtained for the modified evolution equations for the case of planar curves, and 
experiments on real-world images demonstrated the effectiveness of the approach 

Curves evolving in the plane became surfaces evolving in space, called min- 
imal surfaces jS]- Although the theorem on planar curves shrinking to a point 
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could not be extended to the case of surfaces evolving in 3D, the existence, 
uniqueness, and stability results of the level set formalism held analogously to 
the 2D case. Thus the method was feasible for evolving both curves in 2D and 
surfaces in 3D. Beyond elegant mathematics, spectacular results on real-world 
data sets established the method as an important segmentation tool in both do- 
mains. One fundamental limitation to these schemes has been that they describe 
only the flow of hypersurfaces, i.e., surfaces of co-dimension 1. 

Altschuler and Grayson studied the problem of curve-shortening flow for 
3D curves P, and Ambrosio and Soner generalized the level set technique to 
arbitrary manifolds in arbitrary dimension. They provided the analogous results 
and extended their level set evolution equation to account for an additional 
vector held induced on the space 0. 

We herein present the first implementation of geodesic active contours in 
3D, based on Ambrosio and Soner’s work. Specifically, our system uses these 
techniques for automatic segmentation of blood vessels in MRA images. The 
dimension of the manifold is 1, and its co-dimension is 2. 

3 Mean Curvature Flow 

Intuitively, mean curvature flow refers to some curve evolving in time so that at 
each point, the velocity vector normal to the curve is equal to the mean curvature 
vector. This concept is normally defined for arbitrary generic surfaces, but only 
curves are necessary for this paper, so we have restricted the definition. More 
formally, let C{t), t > 0 be a family of curves in 3?^ or 3?^, N the normal for a 
given orientation. That is, C is a curve, and t represents the “time” parameter 
or the index into the family of curves, not position. The mean curvature flow 
equation is then given by the vector equation 

Ct = kN ( 1 ) 

with given initial curve C(0) = Co, k the curvature of the curve, and C* the 
time derivative of the curve. Note that since we consider only ID curves here, 
as opposed to evolving surfaces, the mean curvature is just the usual curvature 
of the curve. This motion is also called “curve-shortening flow” since it is the 
solution, obtained by Euler-Lagrange equations, to the problem of minimizing 
curve length: 

nun J \C'{p)\dp 

where p is the spatial parameter of the curve. 

4 Level Set Method for Planar Curves 

We give the basic idea of the level set method [El to evolve a planar curve C. 
Define a function u : 3?^ ^ 3? so that C is a level-set of u. We follow the conven- 
tion that C is, in particular, the zero level set of u, although this choice is not 
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necessary for the method. The function u is now an implicit representation of 
the curve C. The advantages of this representation are that it is intrinsic (inde- 
pendent of parameterization) and that it is topologically flexible since different 
topologies of C are represented by the constant topology of u. Let Cq be the 
initial curve. 

It is shown in |7| and that evolving C according to 

Ct = PN ( 2 ) 

with initial condition C'(-, 0) = Co(-) for any function /3, is equivalent to evolving 
u according to 

ut = P\Vu\ (3) 

with initial condition u(-,0) = uo(’) and uo(Cq) = 0. 




Fig. 2. Level sets of an embedding function u, for a closed curve in 



This result is independent of the choice of function u |7fti] . As customary in 
the literature, we choose uq to be the signed distance function to the curve C 

(Fig.Ej). 



5 Level Set Method for Curves in Higher Codimension 

The level set evolution equations that follow were proven in 0. They enable us 
to evolve space curves, with evolution driven by both mean curvature and image 
information. In the following discussion, C is a curve in 3D. 

5.1 Mean Curvature Flow 

Let u : 3?^ ^ [0, oo) be an auxiliary function whose zero level set is identically C, 
that is smooth near C, and such that Vu is non-zero outside C. For a nonzero 
vector q G 3R", define 
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Fig. 3. Evolving curves under mean curvature flow. The first three images show a circle 
shrinking to a point, and the last two images show a helix shrinking to its axis 



as the projector onto the plane normal to q. Further define A(Vn(x, t)) 

as the smaller nonzero eigenvalue of The level set evolution equa- 

tion is then 



Vt = \{yv{x,t),V^v{x,t)). 

That is, this evolution is equivalent to evolving C according to Ct = kN in the 
sense that C is the zero level set of v throughout the evolution. 

Figure 0 demonstrates this evolution. As discussed above, a circle shrinks to 
a point under mean curvature motion. Under this motion, a helix evolves into 
its axis. 



5.2 Incorporation of Vector Field 

This section discusses the situation where there is an underlying vector field driv- 
ing the evolution, in combination with the curvature term. Assume the desired 
evolution equation is of the form 



Ct = nN- nd, 

where II is the projection operator onto the normal space of C (which is a vector 
space of dimension 2) and d is a given vector field in 3?^. The evolution equation 
for the embedding space then becomes 

Vt = A(V?;, V'^v) + Vv ■ d. 



5.3 3D Image Segmentation 

For the case of ID structures in 3D images, we wish to minimize 

[\{\VI{C{p))\)\C'{p)\dp 

Jo 

where C{p) : [0,1] ^ 3?^ is the ID curve, I : [0,a] x [0,&] x [0,c] — > [0,oo) 
is the image, and g : [0,oo) — > is a strictly decreasing function such that 
g(r) — > 0 as r ^ oo (analogous to 0). For our current implementation, we use 
g{r) = exp{—r) because it works well in practice. Another common choice is 
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Fig. 4. (a) The tangent to C at p, the normal plane, the image-based vector, and its 
projection onto the normal plane, (b) e-level set method 




Fig. 5. Evolving helix under mean curvature flow with additional vector held: target 
curve, initial level set, level set after evolution with endpoints constrained 



g(|V/|) = By computing the Euler-Lagrange equations, we find that 

the curve evolution equation is 

= (4) 

where H is the Hessian of the intensity function. The second term in the above 
equation is illustrated in Fig. EJa). That is, 



-H 



V/ 



so the equation for the embedding space is 

vt = A(Vv(x, t),V^v(x, t)) + —Vv(x, t) ■ H (5) 

g I V/ I 



Thus, Ambrosio and Soner’s work has provided the basis for the use of mean 
curvature flow and level set methods to segment ID structures in 3D. Figured 
illustrates how underlying image information can attract the evolving tube. The 
underlying volumetric image data is shown, as a maximum intensity projection, 
in the first image. This volume was generated by drawing a cosine curve in 
the volume, then smoothing with a Gaussian Alter. The second image shows the 
initial curve, a helix. The result of the evolution is shown in the rightmost image. 
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Fig. 6. Overview of segmentation algorithm 



6 MRA Segmentation System 

This section describes our system for segmentation of vessels from MRA using 
the described level set method. A flowchart is shown in Fig.0 We discuss issues 
that have arisen in converting the theory above to practice for this application. 

£-Level Set Method: Since the projection operator Pq is defined only for 
non-zero vectors q, the method is undefined at Vu = 0, which is the curve 
itself, and is numerically unstable near the curve. For this reason, we regard v 
as a distance function to a “tube” of small radius e around the curve, instead of 
extracting the true ID curve. That is, we evolve the e-level set instead of evolving 
the true curve (Fig. 2|^b)). Note that e does not denote a fixed value here: we 
mean simply that the evolving shape is a “tubular” surface of some (unspecified 
and variable) nonzero width. In addition to being more robust, this method 
better captures the geometry of blood vessels, which have nonzero diameter. 

Banding: Instead of evolving the entire volume, we evolve only the portion 
of the volume within a narrow band of the zero level set (the current surface). 
This technique is commonly used in level set methods. Normally, we set the 
band to include voxels that are up to 6 voxels away from the surface. We have 
increased this distance up to 12 for some experiments. The advantage of this 
technique is efficiency, and the disadvantage is that we may miss structures that 
are outside the band if the potential function g does not have a large enough 
capture range to attract the segmentation to these structures. This issue can be 
addressed by ensuring that g is compatible with the band size. 

Curvature Instead of Eigenvalues: For computational efficiency and be- 
cause of numerical instability of the gradient computations and thus the evolu- 
tion equation near Vu = 0, we remark that the level sets of the function v flow 
in the direction of the normal with velocity equal to the sum of their smaller 
principal curvature and the dot product of Vu with the image-based vector held 
d. Therefore, we compute the smaller curvature directly from v instead of as an 
eigenvalue of Pvi,V^uPv«- 

Image Scaling: To control the trade-off between fitting the surface to the 
image data and enforcing the smoothness constraint on the surface, we add an 
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image scaling term imscale to Equation 0to obtain 



Vt = A(Vu(a;, t), V^u(a:, t)) + imscale * 



^Vu(a;,<) -H 
9 



V/ 

JW] 



( 6 ) 



imscale is set by the user or can be pre-set to a default value. 

Gradient Directionality: Because vessels appear brighter than the back- 
ground, we weight the image term by the cosine of the angle between the normal 
to the surface and the gradient in the image. This cosine is given by the dot 
product of the respective gradients of v and /, so the update equation becomes 



Vt = X{\7v{x,t),\7^v{x, t)) + imscale * (Vu • V/) * 



9 



\7v{x, t) ■ H 



VI 



( 7 ) 



For example, if the two vectors point in the same direction, then the brighter 
region is inside the surface and the darker region is outside; the angle between 
the vectors is 0, whose cosine is 1, so the image term is fully counted. However, 
if they point in opposite directions, the negative weighting prevents the evolving 
vessel walls from being attracted to image gradients that point in the opposite 
direction. 

Reinitializing Volume: As customary in level set segmentation methods, 
the volume v is periodically reinitialized to be a distance function: the zero level 
set S is extracted, then each point in the volume is set to be its distance to S. For 
our implementation, this reinitialization is itself a level set method. To obtain 
the positive distances, the surface is propagated outward at constant speed of 1, 
and the distance at each point is determined to be the time at which the surface 
crossed that point. A second step propagates the surface inward to obtain the 
negative distances analogously. For some experiments, we have used the Fast 
Marching Method to implement these steps. 

Initial Surface: Figure Q shows additional detail on the generation of the 
initial surface. This initial surface (and thus the initial volume) is normally 




Fig. 7. More detailed illustration of initialization part of algorithm 
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Fig. 8. Illustration of a vertical bar evolving in a segmentation of the first dataset in 

Fig. E3 



generated by thresholding the MRA dataset. However, the method does not 
require that the initial surface be near the target surface but may use any initial 
surface. Figure 0 illustrates a vertical bar evolving into the segmentation of the 
first dataset in Fig. E3 

Smoothing: As shown in Fig.0, the datasets may be pre-processed to reduce 
noise. For the results presented here, the raw datasets were convolved with an 
isotropic Gaussian of cr = 0.5. 

Cleaning: We post-process the segmentations to remove any surface patches 
whose surface area is less than some threshold (a parameter of the method) to 
eliminate patches corresponding to noise in the original data. 

Vessel Radii Estimation: The larger principal curvature can be useful in 
measuring the radii of the vessels for a particular application, since radius is the 
inverse of curvature. This curvature can be easily computed when the smaller 
principal curvature is computed for the segmentation. We have added the option 
to color-code our segmentations based on vessel radii, as estimated from the local 
larger principal curvature of the tubular surface. 



7 Results 

We demonstrate segmentation results on four datasets, courtesy of the Surgi- 
cal Planning Laboratory, Brigham and Womens Hospital and Harvard Medical 
School (Figs. El E3 a,nd ll I j) . All datasets had an initial resolution of .9375x.9375x 
1.5mm^ (256x256x60 voxels). The final example only was resampled to .9375x 
.9375x.9375mm^ (256x256x96 voxels) before segmentation; the other segmen- 
tations were performed directly on the raw data. The images are not square 
(256x256) because uninteresting portions were cropped for efficiency. In Fig. [03 
the initial surface for the segmentation was a surface obtained by thresholding 
the raw dataset whereas in Fig. Elit was a tube as in Fig. 0 imscale also varied as 
discussed below. For comparison. Fig. [n3 first shows results obtained by thresh- 
olding alone. Figure ^^shows an enlargement of a portion of the segmentations 
and corresponding maximum intensity projection considered in Fig. El 

The following parameters were used in these experiments; all settings were 
chosen empirically. For our method, imscale varied across the datasets depending 
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Fig. 9. The first image in each row is the maximum intensity projection of the raw data, 
and the second and third are the segmentation result from two orthogonal viewpoints. 
These results are obtained by our method where the initial surface was a vertical bar 
as showed in Fig. El 
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Fig. 10. Results on three datasets are shown. For each image pair, the first image 
is the maximum intensity projection of the raw data, the second is the segmentation 
result from thresholding only and the third is the segmentation result using our method 



on the noise present. A threshold tinit was used in Fig. ca to obtain the initial 
surface from the dataset; such a threshold was obviously not needed in Fig. 0 
A cleaning threshold c indicated the minimum surface area of connected com- 
ponents of the surface to be retained in the post-processing “cleaning” step. 

For thresholding only, the threshold Uhresh was chosen and also the cleaning 
threshold c. For all datasets, tinit was slightly higher than Uhresh for the same 
dataset: although using a lower tthresh alone looks better after the cleaning step, 
the noise before cleaning worsened our results and led us to use a slightly higher 
value for initialization. 

Recall that obtaining the very small vessels is the goal of this work since 
the large vessels are easily segmented by thresholding. For this reason, imscale 
was set fairly high in the experiments in Fig.Oto obtain the small vessels, at 
the expense of also obtaining many imaging artifacts. A coarser segmentation is 
obtained in Fig. El by choosing lower values for imscale. Although the results in 
this figure are only similar to those obtained by simple thresholding, the objective 
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of the demonstration is academic: it shows that we capture the vasculature shape 
even when the initial guess is meaningless. In practice, better results are obtained 
using thresholding for initialization. 

When considering that the imscale parameter controls the trade-off between 
noise and small vessels in our method, and when comparing our method to 
thresholding alone, it is important to note that it would not be possible to 
similarly lower tthresh to obtain the small vessels (and noise) by thresholding 
alone. Lowering the threshold obtains large blobs in the volume which do not 
correspond to vessels. Our method is thus more powerful than thresholding alone. 

Finally, we demonstrate the capability to color-code the vasculature surface 
based on local curvature. Notice (Fig. IT^ that for a ribbon-like vessel, the flatter 
sides shows a large radius, and the sharply curved edges show a small radius. 
In this example, the colorscale is continuous from darkest to lightest intensities, 
with darkest indicating a radius of curvature < 1mm and lightest indicating a 
radius of curvature > 2mm. The curvatures output by our evolution have been 
smoothed by a 3x3x3 filter prior to coloring the surface. 

8 Future Work 

Vessels tend to appear thinner in our segmentations than in those obtained by 
thresholding. One possible reason is that our method uses gradients instead of 
intensities, so the vessel wall is found attracted to the strongest gradients, which 
may be fully inside the bright region indicated by thresholding. A second option 
is that the underlying mathematics of our algorithm assume that the vessels 
are ID curves, not tubular surfaces. We believe that our e-level set method 
allows the method to successfully handle tubular surfaces, but have not yet 
verified this analytically. A final potential reason for the discrepancy is that 
the segmentations obtained by thresholding may be thicker than the true blood 
vessels due to noise around the vessels. Future work will involve comparisons to 
manual segmentations which will provide ground truth to evaluate both methods. 

We also observe a lot of noise in our segmentations of the first and second 
datasets. As mentioned above, we could obtain much less noise at the expense of 
the thinnest vessels by lowering imscale. For the large amounts of noise in these 
datasets, noise is often indistinguishable from small vessels when only a small 
local neighborhood is considered, as in our algorithm. To address this problem, 
one could reduce noise prior to segmentation by filtering or incorporate a more 
sophisticated image measure into Equation 0 

On the positive side, the segmentation of small vessels that were not ob- 
tainable by thresholding encourages us to continue in the development of this 
algorithm. Although still in preliminary stages, we believe that it has the poten- 
tial to yield effective segmentations of very thin vessels. 
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Fig. 11. Enlargement of a portion of the second example from Fig. HU As above, the 
second image is the segmentation obtained by thresholding alone, and the third image 
is the result of onr method 




Fig. 12. Our method naturally allows estimation of local radii of curvature of the 
segmented vessels. In this image of a partial segmentation of the first dataset in Fig. [HI 
the colorscale is continnous from darkest to lightest intensities, with darkest indicating 
a radius of curvature < 1mm and lightest indicating a radins of curvature > 2mm 
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Abstract. An algorithm is proposed for the fuzzy segmentation of two 
and three-dimensional multispectral magnetic resonance (MR) images 
that have been corrupted by intensity inhomogeneities, also known as 
shading artifacts. The algorithm is an extension of the two-dimensional 
adaptive fuzzy C-means algorithm (2-D AFCM) presented in previous 
work by the authors. This algorithm models the intensity inhomogeneities 
as a gain field that causes image intensities to smoothly and slowly vary 
through the image space. It iteratively adapts to the intensity inhomo- 
geneities and is completely automated. In this paper, we fully generalize 
2-D AFCM to three-dimensional (3-D) multispectral images. Because 
of the potential size of 3-D image data, we also describe a new, faster 
multigrid-based algorithm for its implementation. We show using simu- 
lated MR data that 3-D AFCM yields significantly lower error rates than 
both the standard fuzzy C— means algorithm and several other compet- 
ing methods when segmenting corrupted images. Its efficacy is further 
demonstrated using real 3-D scalar and multispectral MR brain images. 



1 Introduction 

Tissue classification is a necessary step in many medical imaging applications 
including the quantification of tissue volumes, study of anatomical structure, 
and computer integrated surgery. Classification of voxels exclusively into dis- 
tinct classes, however, is problematic due to artifacts such as noise and the 
partial volume effect, which occurs when multiple tissues are present in a single 
voxel. To compensate for these artifacts, there has recently been growing interest 
in fuzzy segmentation methods. In fuzzy segmentations, voxels may be classified 
into multiple classes with a varying degree of membership. The membership thus 
gives an indication of where noise and partial volume averaging have occurred 
in the image. Standard fuzzy segmentation algorithms, however, do not effec- 
tively compensate for intensity inhomogeneities, a common artifact in magnetic 
resonance (MR) images. 

A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. 140-[23 1999- 
@ Springer- Verlag Berlin Heidelberg 1999 
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In MR images, intensity inhomogeneities are typically caused by non-uniformi- 
ties in the RF field during acquisition, although other factors also play a role 
m- The result is a shading effect where the pixel or voxel intensities of the 
same tissue class vary over the image domain. It has been shown that the shad- 
ing in MR images is well modeled by the product of the original image and a 
smooth, slowly varying gain field rrra . Corrupted images may be segmented 
by first applying a correction algorithm (cf. ITIlbl l to remove intensity inhomo- 
geneities, and then applying a standard segmentation algorithm that assumes 
no inhomogeneities are present. 

Several methods have also been proposed that simultaneously compensate 
for the shading effect while segmenting the image. These methods have the ad- 
vantage of being able to use intermediate information from the segmentation 
while performing the correction. Most of these methods, however, have focussed 
on classifying each voxel into distinct tissue classes imimzi- An expectation- 
maximization algorithm has also been proposed [ I iSliS) that models the inho- 
mogeneities as a bias field of the image logarithm. This method is capable of 
obtaining fuzzy segmentations based on posterior probabilities, but for most 
data sets some manual interaction is required to provide training data. 

Recently, we presented some initial results on an unsupervised segmenta- 
tion algorithm called the adaptive fuzzy C-means algorithm (AFCM), designed 
for segmenting two-dimensional (2-D) scalar images corrupted by intensity in- 
homogeneities imni . Based on the fuzzy C-means algorithm (FCM) the 
advantages of 2-D AFCM are that it automatically produces fuzzy segmenta- 
tions, it is robust to inhomogeneities, and it computes a smooth gain field based 
on all pixels in the image. Although this algorithm is suitable for the segmenta- 
tion of MR images obtained using single or multi-slice acquisitions, it cannot be 
used in volumetric acquisitions where the inhomogeneities are three-dimensional 
(3-D) in nature, nor can it be used on multispectral data. 

In this paper, we generalize AFCM to 3-D multispectral images. Our gen- 
eralization also allows for the adjustment of the “crispness” or “fuzziness” of 
the resulting segmentation and for the segmentation of data with ellipsoidal 
shaped clusters. A novel algorithm is presented for computing the gain field 
that typically yields a threefold improvement in speed over a standard multigrid 
approach without reducing accuracy. This speed improvement is especially sig- 
nificant when working with large 3-D data sets. We also provide in this paper 
several new results using simulated data that show that the segmentations ob- 
tained using FCM on uncorrupted images and AFCM on corrupted images are 
accurate both in terms of classification and modeling of partial volume effects. 
Moreover, we show that under default initializations, AFCM’s performance on 
corrupted 3-D images is superior to the performance of methods presented in 
H3| and |rn] . 
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2 Background 

In this section, we give a brief overview of FCM and 2-D AFCM. FCM has 
previously been used with some success in the fuzzy segmentation of magnetic 
resonance (MR) images (cf. CM]) as well as for the estimation of partial vol- 
umes 0. It clusters data by computing a measure of membership, called the 
fuzzy membership, at each voxel for a specified number of classes. The fuzzy 
membership function, constrained to be between zero and one, reflects the de- 
gree of similarity between the data value at that location and the prototypical 
data value or centroid, of its class. Thus, a high membership value near unity 
signifies that the data value at that location is “close” to the centroid for that 
particular class. 

FCM is formulated as the minimization of the following objective function 
with respect to the fuzzy membership functions Uj and the centroids 

C 

>^FCM = (1) 

k—1 

Here, 17 is the set of voxel locations in the image domain, q is a parameter that is 
constrained to be greater than one, Ujk is the membership value at voxel location 
j for class k such that X]fc=i '^jk = Ij Yj is the observed (vector) image intensity 
at location j, and is the centroid of class k. The total number of classes 
C is assumed to be known. The parameter g is a weighting exponent on each 
fuzzy membership and determines the amount of “fuzziness” of the resulting 
classification. For q = 1, Jfcm reduces to the classical within-group sum of 
squared errors objective function and FCM becomes equivalent to the iF-means 
or ISODATA clustering algorithms A commonly used value is g = 2 (cf. El)- 
The operator || • || is any inner product norm on IR^, where P is the number of 
channels in the image, and 11-11 = y/<', >■ By specifying the appropriate norm, 
FCM can be applied to data that possess ellipsoidal shaped clusters, although 
typically the Euclidean norm is used. 

The FCM objective function dU is minimized when high membership values 
are assigned to voxels whose intensities are close to the centroid for its particular 
class and low membership values are assigned when the voxel intensity is far from 
the centroid. The resulting fuzzy segmentation can be converted to a hard or 
crisp segmentation by assigning each voxel solely to the class that has the highest 
membership value for that voxel. This is known as a maximum membership 
segmentation. The advantages of FCM are that it is unsupervised (i.e. it does 
not require training data), and it is robust to initial conditions j0|. However, 
FCM assumes that the centroids of the image are spatially invariant, which is 
not true of images that have been corrupted by intensity inhomogeneities. 

In order to preserve the advantages of FCM, we proposed the following ob- 
jective function fnrmi for segmenting 2-D scalar images possessing intensity 
inhomogeneities : 
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>/afcM2D = EE - 9jVkf 

jeo k=i 

2 2 2 

+^1 E E(^'- * 9 )] + A2 E E E(^- * * 9 )] (2) 

r—1 r—1 s—1 

where yj is the pixel intensity, Vk is the centroid, gj is an unknown gain field to be 
estimated, and Dr is a (known) finite difference operator along the rth dimension 
of the image. The notation (D * g)j refers to the operation of convolving g with 
the difference kernel D and taking the resulting value at the jth pixel. Note 
that Jafcm 2 D assumes q = 2. Equation models the brightness variation of 
the inhomogeneity by allowing the centroids to spatially vary according to the 
gain field gj. The last two terms are first and second order regularization terms 
used to ensure gj is spatially smooth and slowly varying. The finite difference 
operators act like derivatives, except they are performed on a discrete domain. 
AFCM, like FCM, does not place any assumption of spatial smoothness on the 
membership functions uj. 

In jl I j . (j2D was minimized by taking its first partial derivatives with respect 
to u, V, and g, and performing iterating through these three necessary conditions. 
The necessary condition on g leads to a difference equation with spatially varying 
coefficients that was solved using a standard multigrid approach (see Sect. 13.311 . 

3 Adaptive Fuzzy C-Means 

In this section, we generalize the AFCM objective function to 3-D, multispectral 
images and describe an algorithm for minimizing the objective function. We also 
describe an implementation that yields much faster results than the standard 
multigrid approach. 



3.1 Objective Function 

When working with multispectral MR data corrupted by intensity inhomo- 
geneities, there are two possible assumptions one can make about the gain field: 
1) the gain field is a scalar field; 2) the gain field is a vector field. The first as- 
sumption implies that the brightness variation in each component or spectra of 
the acquired image is identical, while the second assumes that they can be differ- 
ent. In practice, we have found in double-echo MR data that the scalar gain field 
assumption provides nearly identical segmentation results to the vector gain field 
assumption and is also faster, requiring fewer computations. Furthermore, the 
algorithm derived from the scalar case is notationally cleaner and therefore more 
easily explained. For these reasons, we focus mainly on the scalar assumption 
for the remainder of this paper. 
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Using the scalar gain field assumption, we define AFCM to be an algorithm 
that seeks to minimize the following objective function with respect to member- 
ship functions Uj, the centroids v^, and the gain field g\ 

C 

•/afcm = EE u%\\yj - 93^kf 

k—1 

R R R 

-hAi ^2 XI E E(^^ * * 5)|- (3) 

r—l j^f2r—ls—l 



This equation is applicable to 2-D images when R = 2 and to 3-D images when 
i? = 3. For R = 2, q = 2, and scalar image data, Eq. m reduces to the 2-D 
AFCM objective function given in 

If we assume that the membership values Ujk and the centroids v^, are known 
in Q, then the gain field that minimizes Tafcm is the field that makes the 
centroids close to the data, but is also slowly varying and smooth. Without 
the regularization terms, a gain field could always be found that would set the 
objective function to zero. If Ai and A 2 are set sufficiently large, then the gain 
field is forced to be constant and the AFCM objective function essentially reduces 
to the standard FCM objective function. 

The scalar gain field objective function Jafcm in Eq. o can be minimized 
by taking the first derivatives of Jafcm with respect to Ujk, v^, and gj, setting 
them equal to zero, and iterating through these three necessary conditions for 
Jafcm to be at a minimum. This yields the following algorithm: 

Algorithm 1: AFCM 

1. Provide initial values for the centroids, v^. A: = 1, . . . ,C, and set the gain 
field gj equal to one for all j S C. 

2. Compute membership functions as follows: 



Ell^J 



( 4 ) 



for all j G and fc = 1, . . . , C. 

3. Compute new centroids as follows: 



Vfe 



jeo 






k = l,... ,C. 



( 5 ) 



4. Compute a new gain field by solving the following space- varying difference 
equation for g^: 



C 



c 
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where the convolution kernels Hi and H 2 are given by 

R 

Hi = + Dr)j (6) 

r— 1 
R R 

{{D, * Ds) + {Dr * Ds))^ (7) 

r—1 

where D is the mirror reflection of the finite difference operator D. Standard 
forward differences were used in this work. 

5. If the algorithm has converged, then quit. Otherwise, go to Step 2. 

We define convergence to be when the maximum change in the membership 
functions over all pixels between iterations is less than a given threshold value. 
In practice, we used a threshold value of 0.01. Methods for determining initial 
centroids in Step 1 are described in Sect. Solution to the difference equation 
in Step 4 is described in Sect. roi 

3.2 Initial Centroids 

AFCM requires an initial estimate of centroid values. Like FCM, AFCM is fairly 
robust to the selection of these initial estimates; however, proper selection will 
generally improve accuracy and convergence of the algorithm. We propose two 
methods for automatically selecting initial centroids: the first method may be 
applied generally to all scalar data, while the second method is specific to mul- 
tispectral MR images. 

If the given data is scalar- valued, then one can apply the approach described 
in PPH, where the modes of a critically smoothed kernel estimator of the 
image histogram are used to determine the initial centroids. The approach is 
essentially the same as the “bump-hunting” algorithm described by Silverman 
in m. Briefly, a kernel estimator of the histogram is smoothed in an iterative 
fashion until it possesses a number of modes equal to the desired number of 
classes, C . These modes are then numerically computed using first and second 
derivatives of the kernel estimator and used as initial centroids. 

For multispectral data, manipulation of a multidimensional kernel estimator 
can be computationally prohibitive. In this case, one can obtain initial centroids 
by applying the approach described in m- This approach requires a priori 
knowledge of the approximate Ti, T 2 , and proton spin density of the tissue 
classes being segmented. Most of these values for different tissue classes have 
been documented in the literature (cf. |2|). These values can then be used in an 
imaging equation derived for the corresponding pulse sequence (e.g. spin echo) to 
obtain expected intensity values. This rough initialization is normally sufficient 
for AFCM to yield good convergence properties. 

3.3 Solution to Gain Field 

In Step 4 of AFCM, a new gain field is computed given the current values of the 
centroids and membership functions. This is the most computationally intensive 
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step in AFCM and deserves special attention in its numerical implementation. 
Because the difference equation dD is space-varying, the gain field cannot be 
found using standard frequency domain filters. The equation could be solved 
iteratively using the Jacobi or Gauss-Seidel schemes but these methods take 
a large number of iterations to converge. In mill!], this equation was solved 
using a standard multigrid algorithm at each iteration of AFCM (for a general 
overview of multigrid algorithms, see HH or 0). For 2-D images, this approach 
is sufficiently fast, but for large 3-D images, execution times can grow to several 
hours. We now describe a modified multigrid algorithm that yields significantly 
faster overall execution time without loss of accuracy. Its premise is that during 
early iterations of AFCM, only an approximate solution to the gain field is 
required. Thus, a subsampled solution is used and later refined as the number 
of iterations increases. 

Figure^ illustrates the structure of a multigrid pyramid. Level 0 represents 
the original resolution of the data, while the higher levels represent increas- 
ingly coarser representations of the data. The basis of a multigrid algorithm is 
the substitution of fine grid iterations for solving Eq. o, with iterations on a 
coarse grid, thereby reducing the number of computations required. In addition, 
the multiresolution update scheme used in a multigrid algorithm yields much 
faster convergence. In cmsi, the gain field was computed by applying one full 
multigrid E-cycle ^ at each iteration of 2-D AFCM. A four level full multigrid 
E-cycle is illustrated in Fig.Qfc. 

For 3-D images, we propose a new, faster method that takes advantage of 
the fact that during early iterations of AFCM, the estimates of the centroid and 
membership functions are poor and an exact solution to the gain field is not 
necessary. We define a truncated multigrid cycle at level L to be a full multigrid 
E-cycle that terminates the first time the Lth pyramid level is reached. In Fig.Dt, 
the termination points of a truncated multigrid cycle are shown as open circles. 
For a truncated multigrid cycle at level L > 0, the estimated gain field is an 
approximation of the final solution on a coarse grid but it can be computed 
quickly. The implementation of AFCM using a truncated full multigrid cycle 
proceeds as follows: 
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Algorithm 2: AFCM using truncated multigrid cycle 

1. Set the size of the multigrid pyramid to some value K. Set L = K — 2. 

2. Run entire AFCM algorithm until convergence using a truncated multigrid 
cycle at level L to solve for the gain field at each iteration. 

3. If L > 0, decrease L by 1. Using the most recent values of u,v, and g as 
initial values, go to Step 2. Else if L = 0, terminate. 

This modified multigrid algorithm greatly increases the speed of AFCM dur- 
ing its early iterations. As the number of iterations increase, the truncation 
level reduces towards the original resolution and the iterations become slower. 
If a result is required quickly, one can terminate Algorithm 2 at some value of 
L > 0. This provides an approximation of the final solution. We have found that 
since the gain field is smooth, the approximation error decreases rapidly as the 
resolution increases. 



4 Results 

AFCM was implemented in C on a Silicon Graphics 02 system with an RIOOOO 
processor running IRIX 6.3. It has been tested on both real MR data as well 
as simulated MR brain images obtained from the Brainweb simulated brain 
database at the McConnell Brain Imaging Centre of the Montreal Neurological 
Institute, McGill University 0. (Simulated brain data sets of varying noise, 
inhomogeneity, and contrast are available on the World Wide Web at the website 
listed under References.) In this section, we present the application of AFCM 
only to 3-D images. For 2-D results, readers are referred to El . In all results that 
follow, the value of q was set to 2, and the standard Euclidean distance norm 
was used. We denote the AFCM results computed with the full multigrid U-cycle 
as FM-AFCM and the results computed with the truncated multigrid U-cycle 
as TM-AFCM. Using FM-AFCM, execution times for a 3-D, Tl-weighted, MR 
data set with 1mm cubic voxels are typically between 45 minutes and 3 hours. 
Using TM-AFCM, execution times are between 10 minutes and 1 hour. We show 
in this section that this speed increase does not result in reduced accuracy. 

4.1 Visual Evaluation of Performance on Simulated Data 

Figure El shows the results of applying FCM and AFCM on a Brainweb simu- 
lated MR brain image. This brain image was simulated with Tl-weighted con- 
trast, 1mm cubic voxels, 3% noise and 40% image intensity inhomogeneity. All 
extracranial tissue was removed prior to applying the segmentation algorithms. 
The number of tissue classes was assumed to be three, corresponding to gray 
matter (CM), white matter (WM), and cerebrospinal fluid (CSF) tissue classes. 
Background pixels were ignored. Figure Et shows a slice from the simulated data 
set and Fig. Eb shows the true partial volume model of the gray matter (CM) 
tissue class that was used to generate the simulated image. Figures Eh and Etl 
show the CM membership function obtained by applying FCM and TM-AFCM, 
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(a) (b) (c) (d) 

Fig. 2. FCM and AFCM membership functions: (a) Simulated MR phantom, (b) GM 
partial volume truth model, (c) FCM GM membership function, (d) TM-AFCM GM 
membership function. 




Fig. 3. Comparison of hard segmentations: (a) truth model, (b) FCM max membership 
segmentation, (c) AMRF segmentation, (d) TM-AFCM max membership segmenta- 
tion. 



respectively, to the 3-D data set. Bright areas represent where the member- 
ship function is close to one. Because of the shading effect present in the data, 
the FCM membership function deteriorates near the bottom of the image. The 
AFCM result, however, shows less speckling at the bottom of the image and is 
very similar to the true partial volume image. Both results do, however, show 
some overall grain because of the effects of noise. 

Figure 0shows the results of three different segmentation algorithms applied 
to the same data set described in the previous example. Figure OK shows the 
true hard segmentation of the simulated data. CSF is labeled as dark gray, GM 
as light gray, and WM as white. Figures Ob-d show the maximum membership 
segmentation produced by FCM, the segmentation produced by the adaptive 
Markov random field (AMRI0) method used in m, and the maximum mem- 
bership segmentation produced by TM-AFCM, respectively. Clearly, the AFCM 
segmentation is most similar to the truth model. Both the FCM and AMRF 



^ This method is also very similar to the one described in nni. 
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Table 1. Error measures from simulated data results 



Method 


0% MSE 


20% MSE 


Error 
40% MSE 


measure 
0% MCR 


20% MCR 


40% MCR 


FCM 


0.0194 


0.0272 


0.0517 


3.988% 


5.450% 


9.046% 


FM-AFCM 


0.0210 


0.0242 


0.0251 


4.171% 


4.322% 


5.065% 


TM-AFCM 


0.0210 


0.0214 


0.0244 


4.168% 


4.322% 


4.938% 


EM 


0.0437 


0.0491 


0.0770 


6.344% 


7.591% 


13.768% 


AMRF 


- 


- 


- 


3.876% 


4.795% 


6.874% 


MNI-FCM 


- 


- 


- 


4.979% 


4.970% 


5.625% 



results segment much of the WM as GM near the bottom of the image. The 
AMRF segmentation is also spatially smoother than the other methods. This 
is because it takes into account pixel dependency while both FCM and AFCM 
classify pixels independently. 

4.2 Quantitative Evaluation of Performance on Simulated Data 

Table Q summarizes error measures resulting from applying the FCM, FM- 
AFCM, TM-AFCM and the AMRF algorithms to Brainweb simulated Tl-weight- 
ed data sets (1mm cubic voxels, 3% noise) with varying levels of inhomogeneity. 
Also shown are the errors using an expectation-maximization (EM) algorithm 
for finite Gaussian mixture models |^. In addition, error measures were also 
computed for a segmentation obtained by first applying the N3 inhomogeneity 
correction software m obtained from the Montreal Neurological Institute, then 
applying FCM. The results of this method are given in the row labeled MNI- 
FCM. Two error measures were used. The first measure was the mean squared 
error (MSE) between the true GM partial volume and the GM fuzzy member- 
ship function. For the EM algorithm, the posterior probability of each tissue 
class given the data was compared with the GM partial volume. The second 
error measure used was the misclassification rate (MGR), defined as the number 
of pixels misclassified by the algorithm divided by the total number of pixels in 
the image. For FM-AFCM and TM-AFCM, the parameters Ai and A 2 were fixed 
to a default value of 2 x 10'^ and 2 x 10® respectively. Default parameters were 
also used for all other segmentation methods. 

Columns 1-3 show the MSE resulting from segmenting data sets with 0%, 
20%, and 40% inhomogeneity, respectively. Similarly, columns 4-6 show the MGR 
for the same respective data sets. The MSE columns show that AFCM is capable 
of estimating partial volume coefficients with a reasonable accuracy even in the 
presence of inhomogeneities. The MGR columns show that as the inhomogene- 
ity is increased, the errors for all methods also increase. However, the AFCM 
methods are much more robust to increased inhomogeneity than the other meth- 
ods, with TM-AFCM achieving slightly lower errors than FM-AFCM. The EM 
algorithm performs poorly with respect to both error criteria, possibly because 
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the Gaussian mixture model assumption is incorrect. In the case of 40% inho- 
mogeneity, AFCM provides an improvement of nearly 50% over FCM, nearly 
30% over the MRF methods, and over 10% over the MNI-FCM method. At zero 
inhomogeneity, both the FCM and AMRF methods perform slightly better than 
AFCM, while AMRF yields the lowest error. This is expected since the AMRF 
method provides some smoothing of noise, while FCM and AFCM do not. The 
increase in error of AFCM over FCM in the zero inhomogeneity case is due to 
the additional freedom of the gain field. This effect is also seen in the errors 
resulting from the MNI-FCM method. One could easily reduce the error by in- 
creasing the regularization terms, if the amount of inhomogeneity was known to 
be low. The difference in error is small, however, and overall, AFCM performs 
well on images of varying inhomogeneity without the need for modifying the 
regularization parameters. Note that one can potentially achieve much lower er- 
rors in each of the AFCM, AMRF, and MNFFCM methods if more information 
about the inhomogeneity is known a priori, thereby allowing some tailoring of 
their parameters. 



4.3 Correction of Inhomogeneities 

Figure El shows the results of using AFCM to correct the inhomogeneity in an 
actual 3-D Tl-weighted MR image data set. Figure Et shows a slice from the 
original data set. Figure Et> shows the same slice after correction by AFCM. 
The correction was obtained by multiplying the original image by the reciprocal 
of the estimated gain field. The corrected image does not exhibit the left to 
right shading present in the original image. Figure ^ shows the computed gain 
field for that slice. The gain field is actually computed everywhere in the image 
domain but for visual purposes, it has been masked by the brain area. Note the 
bright area on the upper left quadrant of the image has been captured by the 
gain field. 

Figures 04 andEfe show histograms of the slice before and after the correction 
has been performed. On a typical histogram of an uncorrupted MR image, three 
modes are present corresponding to (from left to right) CSF, CM, and WM. The 
original histogram in Fig. 01, however, exhibits an additional mode around an 
intensity of 80 that corresponds to the bright WM on the upper left of the image 
slice. The corrected histogram does not possess this additional mode and also 
shows a significant improvement in contrast between the modes corresponding 
to CM and WM. 



4.4 Multispectral Data 

Figure 0 shows the results of FCM and TM-AFCM when applied to a 3-D spin 
echo (T 2 -weighted and proton spin density (PD) weighted) multispectral MR 
data set that has been preprocessed to removed extracranial tissues. Figures 0i 
and Et show a PD-weighted and the corresponding T2-weighted slice, respec- 
tively, from the data set. Figures 0; and 01 show the CM membership functions 
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Fig. 4. Correction of inhomogeneity using TM-AFCM: (a) slice from original MR im- 
age, (b) MR slice after AFCM correction, (c) gain field computed using AFCM, (d) 
histogram of slice before correction, (e) histogram after correction. 



computed by FCM and AFCM, respectively. One can see that the FCM member- 
ship function has a noticeable fading on the left side. There is also an increased 
speckling in the FCM membership function on the right side of the image. The 
AFCM membership function, however, is markedly cleaner and does not exhibit 
the same fading. Figures S and Et show the contour of where the CM mem- 
bership function is equal to the white matter membership function, overlayed 
on the PD-weighted slice. The inhomogeneity can have the effect of shifting the 
apparent boundaries between tissue classes. On the upper right hand side of Fig. 
5e, the FCM contour has shifted inward towards the center of the image while on 
the left of the image, the contour has shifted outward. The AFCM contour how- 
ever, conforms to the GM-WM boundary as seen on the original images much 
more accurately. 
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Abstract. Physicians often perform diagnoses based on the evolution of 
lesions, tumors or anatomical structures through time. The objective of 
this paper is to automatically detect regions with apparent local volume 
variation with a vector field operator applied to the local displacement 
field obtained after a non-rigid registration between successive temporal 
images. In studying the information of apparent shrinking areas in the 
direct and reverse displacement fields between images, we are able to 
segment evolving lesions. Then we propose a method to segment lesions 
in a whole temporal series of images. In this paper we apply this approach 
to the automatic detection and segmentation of multiple sclerosis lesions 
in time series of MRI images of the brain. 



1 Introduction 

1.1 Multiple Sclerosis Data 

Multiple sclerosis is a progressive disease that requires an evolution study through 
time. The evolution of the disease can be followed on a patient with a tempo- 
ral series of examinations. A time series of 3D images of a patient is acquired 
from the same modality and with a definite protocol to have similar properties: 
similar histogram, field of view, voxel size, image size, etc. In this paper we use 
two sets of multiple sclerosis time series composed of T2 weighted MRI images. 
These two time series come from the Brigham and Women ’s Hospital0 and from 
the BIOMORPH El European project. The data from the Brigham and Women’s 
Hospital consist in 256 x 256 x 54 images, with a voxel size of 0.9 x 0.9 x 3.0 
mm. The temporal interval between two images of the series is about one week. 
The data from the BIOMORPH project consist in 256 x 256 x 24 images with 
a voxel size of 0.9 x 0.9 x 5.0 mm. The temporal interval between two images of 
the series is about four weeks. 



^ Guttman and D’’ Kikinis 

^ http:/ /www. cs.unc.edu/-styner/biomorph/biomorph.html 

A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. 154- 11671 1999. 
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1.2 Quantitative Measurements 

A quantitative analysis is required to give accurate and reproducible results, and 
because the data are large. Between two examinations, a patient does not have 
the same position in the acquisition device. Therefore images at different times 
are not directly comparable. We have to apply a transformation to each image to 
compensate for the difference in position (translation) and orientation (rotation) . 
Then we can compare the two images, and apply automatic computerized tools 
to detect and quantify evolving processes There are several existing automatic 
methods to study the lesions of multiple sclerosis in time series: 

— With a single image, it is possible to threshold or to study the image intensity 
to segment lesions P- Unfortunately, thresholding does not always make it 
possible to distinguish the lesions from the white matter. 

— It is possible to subtract two successive images to find areas where the le- 

sions have changed. But this method has two major problems. First, the 
subtraction is extremely dependent on the rigid registration For in- 

stance, we show in Fig. US] an evolving lesion that appears in the image of 
the subtraction as a dark hole. But when the registration is inaccurate, it is 
hard to distinguish evolving lesions: the edges of the anatomical structures 
appear (cortex, ventricles, etc.) and give the same apparent information as 
the lesions. Secondly, the subtraction only characterizes the difference of in- 
tensity between two images. The image of the subtraction does not give a 
contrasted image with respect to the evolution ratio, but only with respect 
to the difference between the intensity of the lesion and the intensity of the 
background. For example we show in Fig. Ethat if we threshold the image 
of the subtraction, only some parts of the evolving structures are detected. 
Moreover the threshold value is not related to the amplitude of the evolu- 
tions as can be seen in Fig. [H where a series of threshold values is applied to 
a synthesis example. 




Fig. 1. Different threshold values applied to an image of snbtraction. For each value, 
only some parts of the evolving structures are detected. Moreover, the threshold value 
is not related to the amplitude of the evolutions 



— With n images, it is possible to follow the intensity of each voxel in time 
P|. Although very nice results are obtained with perfectly rigidly aligned, 
the approach remains sensitive to the rigid registration, and there is no di- 
rect relation between the amplitude of evolution and the variation of voxels 
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Fig. 2. Method of detection and segmentation of evolving processes using the displace- 
ment field 



intensity. Moreover, this method does not take into account the spatial cor- 
relation between neighboring voxels. 



1.3 A New Method Based on the Displacement Field 

Our idea is thus to avoid a voxel by voxel comparison and to use the “apparent” 
motion between two images. Figure Elshows the different stages of the automatic 
processing and gives an overview of this paper. First, images are aligned by a rigid 
registration. Then we compute the displacement field to recover the “apparent” 
motion between images with a non-rigid registration algorithm. We focus on the 
detection of the regions of interest of the field thanks to vector field operators, 
and use them to segment evolving lesions. This work is a natural continuation 
of the previous research work of Thirion and Calmon . 

2 Computation of the Displacement Field 

2.1 Rigid Registration 

First we compute a rigid registration with an algorithm which matches “ex- 
tremal” points defined as the maxima of the crest lines of the images jS|. Fea- 
ture points called “extremal” points are automatically extracted from the 3D 
image. They are defined as the loci of curvature extrema along the “crest lines” 
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image 1 



image 2 



displacement field (zoom) 



Fig. 3. An example of the compntation of the “apparent” displacement field thanks to 
a non-rigid registration algorithm. Notice how it emphasizes the shrinking lesion 



of the isosurface corresponding to the zero-crossing of the Laplacian of the im- 
age. Based on those stable points, a two-step registration algorithm computes a 
rigid transformation. The first step called “prediction” looks for triplets of points 
from the two sets which can be put into correspondence with respect to their 
invariant attributes. The second step called “verification” checks whether the 
3D rigid transformation computed from the two corresponding triplets is valid 
for all the other points. A study of the accuracy of this algorithm, especially for 
aligning MS data, can be found in jjj. 



2.2 Non-rigid Registration 

We compute the 3D displacement field with a non-rigid algorithm based on 
local diffusion 0. This algorithm diffuses the first image into the second one. 
Each point of the second image “attracts” or “repels” the point that has the 
same coordinates as the first image according to their difference of intensity. 
All these forces are regularized and deform the second image. The process is 
iterated based on a multi-scale scheme. At the end, each point P(x,y, 2 )^ of 
the reference image has a vector u{ui{P),U 2 {P),U 3 {P)) that gives its apparent 
displacement (cf Fig. 0- We can also define the deformation which is a 
function (l>{(l>i{P),<p 2 {P),'^ 3 {P)) that transforms the point P{x,y,z)'^ into the 
point P'{x' ,y' , z')^ . We have thus: 

{ x' = X + ui{x,y,z) = ^i{x,y,z) 
y' = y + U 2 (x, y, z) = ^ 2 {x, y, z) 
z' = z + U 3 {x, y, z) = <p 3 {x, y, z) 

This apparent displacement field u gives an idea of the time evolution between 
two images. We can compute the two fields: from image 1 to image 2, and from 
image 2 to image 1, which contain complementary information as we will see in 
section E~n Figure El shows the vector field from 1 to 2 around a lesion, emphasiz- 
ing a radial shrinking. The vector field operators should transform a 3D vector 
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Fig. 4. u(P) is the apparent displacement of P at time 1. P' = P+u{P) is the apparent 
location of P at time 2. The Jacobian of the apparent deformation measures the local 
volume variation (see text) 



field in a simpler representation that is a 3D scalar image. This scalar image 
should be contrasted with respect to the time evolutions. Moreover we need to 
introduce operators that have a physical meaning for a better interpretation. 



3 The Jacobian Operator 



3.1 Mathematical Expression and Physical Meaning 



We introduce as an operator the Jacobian of the deformation function at point 
P, as suggested from jOj: <? 2 (P), ^ 3 (P))- This operator is widely used 

in continuum mechanics nm HD. The Jacobian of ^ at point P is defined as: 



Jacobian = det{Vp<P) 
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It can also be written with the vector displacement field u(rti,U 2 ,U 3 ) at P: 



det{\7p<l>) = det{Id + Vpu) 



dui _j_ 1 dui dui 

dx ' ^ 8y dz 
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du3 du3 du3 I 1 

dx dy dz ^ 



It is useful to recall a physical interpretation of the Jacobian operator in terms of 
local variation of volume. With the notations of the Fig.0 u(P) is the apparent 
displacement of P at time 1. P' = P + u(P) is the apparent location of P at 
time 2. The volume 5V of the elementary tetrahedron defined by (P, P + Ja:, P + 
5y, P + Sz) is given by: 
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As we assume that Sx is small, a first order approximation of the deformation 
^ in P is given by d>{P + Sx) = ^(P) + ^Sx + o(Sx^). We have the same 
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approximation in y and z directions. Thus the volume 5V of the deformed 
elementary tetrahedron is: 



5V'^l 

Therefore: 



11 11 
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^ dx ay dz 

0 ^Sz 

^ dx ay az 

0 ^Sv ^Sz 
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^JaCp{(P)6xdydz. 



6V' ~ Jacp (<P) ■ SV. 

Thus, the local variation of an elementary volume is given (as a first order 
approximation) by the Jacobian of the deformation function When JaCp{<P) > 
1 there is a local expansion at point P, and when JaCp {<!>) < 1 there is a local 
shrinking at point P. The transformation is locally preserving the volume when 
JaCp (<P) = 1 . 



3.2 Robustness of the Jacobian with Respect to Misalignment 

Figure 0 shows what happens when two images are not perfectly aligned: the 
deformation function W, which is measured, is different from the ideal one 
The misregistration is given by a residual rotation R and translation t. We have 
W = Ro<l> + t. 



image 1 image 2 




Fig. 5. $ is the deformation function for a perfect rigid registration, and '1' is the 
deformation function when there is a misregistration (R,t). We have = Ro + t 



Then we have: 

Jac (If) = det{VW) = det{V{R o<P + t)) = det{R ■ V<?) = Jac (<?). 

Therefore the Jacobian of the theoretical deformation function (for a perfect 
rigid registration) is equal to the Jacobian of a measured deformation function 
(whatever the misregistration). Of course this requires that, even in the case of 
an approximate alignment of images, the non-rigid registration still computes a 
correct displacement field. In our case the rigid registration is performed because 
our non-rigid registration algorithm requires a proper initial alignment to give a 
good result. Nevertheless, the rigid registration does not have to be as accurate 
as for the subtraction method where a precision better than or equal to one voxel 
is required. 
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3.3 Computation and Application of the Jacobian 

We have seen that the computation of the Jacobian of the deformation (p can 
be performed directly with the displacement field u. We need to compute the 
first 9 derivatives of the displacement field u: . . . , For a faster 

computation we use recursive filtering that gives an image for each derivative. 
Then, we need to store in memory the 9 derivatives to compute the Jacobian and 
for an image of 256 x 256 x 180 this requires about 425M-bytes of memory. So 
to avoid overfilling the memory space we compute the Jacobian on sub-images 
and then we fuse the different sub-results which include an overlapping border 
to avoid side effects. 

The Jacobian gives a contrasted image with respect to the evolution am- 
plitude. The most contrasted areas tend to correspond to shrinking or growing 
lesions. In Fig. El we see that an important shrinking of a lesion between two 
images gives a dark region in the Jacobian image. On other areas, the value is 
almost constant and very close to 1, which indicates no apparent variation of 
volume. A zoom around a lesion shows that darker areas correspond to shrinking 
lesions. 




Fig. 6. Application of the Jacobian: we can see a lesion that shrinks 
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3.4 Other Operators 

Calmon and Thirion have developed another vector field operator based on the 
divergence and the norm of the displacement field u H2i m-- 



norm ■ div{P) = |ju(P)||dw u(P) = |lu(P)||(^ + ^ + ^). 

This operator has no simple physical meaning even if the sign of the operator 
gives information about shrinking (negative values) or expansion (positive val- 
ues) . As we have no physical interpretation of the value, it is difficult to threshold 
the image automatically in order to extract the regions of interest. 

Prima et al. proposed another operator which gives the local variation of 
volume HH.A cell of voxels of volume is V\ is deformed to a complex polyhedron 
which volume V 2 is computed. Then is calculated. Note that another 

algorithm to compute V 2 is given in US!. This operator is directly related to the 
Jacobian: 



V 2 -V 1 

Vi 



V2 

Vi 



1 ~ Jac — 1. 



Figure Q shows the application of these three operators on the same dis- 
placement field. In particular we can notice how the Jacobian and the discrete 
computation of the relative variation of volume are similar. The advantage of 
our approach is that it provides a continuous framework for a computation of 
the Jacobian at any scale. 




Fig. 7. Comparison between different existing operators, (a): ||u||ciin u. (b): discrete 
computation of ~ (Jac (^) — 1). (c): Jacobian 
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4 Results 

4.1 Thresholding and Segmentation 

We can extract the areas that correspond to a significant time evolution. It is 
possible to find a uniform threshold over the whole Jacobian image relying on its 
physical interpretation in terms of local variation of volume. We chose an empiric 
threshold of 0.3 for significant shrinking. An example in Fig.|H|shows that it gives 
a good segmentation of a shrinking lesion, correspond to shrinking lesions. In 




Fig. 8. The threshold det(V^) <0.3 makes it possible to segment shrinking lesions 



fact, we are going to focus only on the shrinking areas. We can see in Fig. 0 that 
a better description is provided with the shrinking field. If there is an important 
expansion locally between images 1 and 2, we would need a one to many mapping 
due to limited resolution of the image. To avoid this, we consider only shrinking 
regions from 1 to 2, and then shrinking regions from 2 to 1. By thresholding 
shrinking areas we obtain the segmentations si^2 in the first image, and S2^i 
in the second image. Then we have to combine those two information: the whole 
segmentations in image 1 and 2 are given by 5'i2(tl) = [si^2] U [u2^i(s2^i)], 
and Si 2 (t2) = [s2^i] U [ui^ 2 (si^ 2)]- Figures IHI show automatic segmentation 
results obtained at two times. 

With the fields between images 1 and 2 and between images 2 and 3, we can 
compute segmentations S'12 in the images 1 and 2 and 523 in the images 2 and 
3. Then we propagate the segmentations 5i2 and S 23 respectively to times t3 
and <1, thanks to the vector fields U21 and U23. Then by addition, we obtain a 
segmentation of the lesions in all the images of a series ((TEj)- In Fig. El we can 
see the result of this method on three successive instances. 

4.2 Study on a Synthetic Example 

We have created two images I± and I 2 , by including two artificial evolving 3- 
D lesions into the same 3-D T2 weighted image of a brain without lesions. The 
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field from 1 to 2 (expansion) 




field from 2 to 1 (shrinking) 



evolving lesion or anatomical structure 



Fig. 9. The information is richer when we look at the shrinking field. Left: If there is 
a large expansion, the direct displacement field cannot express that one voxel should 
deform to several voxels. We would need a one to many mapping due to limited res- 
olution of the image. Right: Thanks to the reverse field, a better description of the 
phenomenon is possible 




Fig. 10. Segmentation of evolving lesions. Left: Brigham & Women’s Hospital data. 
Right: BIOMORPH data 



artificial lesions are represented by spheres of radius respectively 10mm and 4mm 
in /i, and 6mm and 8mm in I 2 (Fig. IT2h ,L Because the global rigid registration 
of I\ and I 2 is the identity in this case, we have only applied the non-rigid reg- 
istration algorithm to compute the direct and reverse local displacement field 
everywhere. We have then applied our method to extract the boundary of evolv- 
ing regions, with Jac{<P) < 0.3. Results on Fig. IT2h show that the evolving 
regions are correctly detected. The accuracy of the delimitation of the boundary 
is qualitatively correct, but we observed a difference between 5 and 20 percent 
between the correct diameter of lesions and the measured one. 



4.3 Robustness with Respect to Imperfect Rigid Registration 

From the previous example, we also created an image by translating I 2 by 3 
voxels in one direction. As expected, our method provides similar results when 
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Fig. 11. Thanks to the segmentation of the evolutions between times 1 and 2, and 
between times 2 and 3, it is possible to visualize the lesions evolution between the 3 
successive acquisitions 



applied to I\ and (Fig.Efe) , while a simple difference yields very noisy results 

(Fig. inn). 

We also considered the application of our method between two real T2 
weighted MR image, /mi and Jm 2 (same 3D images as the ones presented in Fig. 
OJ. When Irrii and Jm 2 are perfectly rigidly registered, our method produces 
the segmentation of an evolving lesion in the cross-section shown in Fig. II 3h . 
which can be compared to a simple difference analysis between the registered 
images iFig.lH^l. We also created an image Im '2 by adding a misalignment to I 2 
corresponding to a rotation of 1 degree around an axis orthogonal to this cross- 
section and passing through its center, plus a translation of 1 voxel in the two 
directions of the plane of this cross-section. We observe that the results provided 
by our method (Fig. Eb) remain similar to the results of Fig. Ebj whereas a 
simple difference now produces very noisy results (Fig. Ildt ll. 

5 Conclusion 

In this article we proposed a new method to study multiple sclerosis lesions evo- 
lution through time based on the apparent displacement field between images. 
We believe that our approach will be useful to detect evolving regions corre- 
sponding to local apparent expansion or shrinking. As this method is robust 
with respect to imperfect rigid alignment, we plan to use it in combination with 



Automatic Detection and Segmentation of Evolving Processes 165 




Fig. 12. (a): two synthetic temporal images Ii and I 2 . (b): the Jacobian image of the 
field from 7i to I 2 and I 2 to li. (c): automatic segmentation of evolving lesions in li 
and I 2 using Jac($) < 0.3. (d): I 2 — Ii on the left. On the right 7^ — Ii where I 2 is a 
translated version of 72. (e): automatic segmentation of evolving lesions in 7i and 7^, 
which shows robustness to imperfect rigid registration of images 



other segmentation algorithms in order to delineate more precisely the bound- 
ary of the lesions in temporal sequences. Then we will compare our results with 
manual and other automatic segmentation results ca This will be done within 
the BIOMORPH project. Finally we plan to apply our approach to study the 
“mass effect” by quantifying the evolution of anatomical structures such as the 
cerebral ventricles or the interface between grey matter and white matter. 
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Abstract. Inter-subject non-rigid registration of cortical anatomical 
structures as seen in MR is a challenging problem. The variability of 
the sulcal and gyral patterns across patients makes the task of regis- 
tration especially difficult regardless of whether voxel- or feature-based 
techniques are used. In this paper, we present an approach to matching 
sulcal point features interactively extracted by neuroanatomical experts. 
The robust point matching (RPM) algorithm is used to hnd the optimal 
affine transformations for matching sulcal points. A 3D linearly interpo- 
lated non-rigid warping is then generated for the original image volume. 
We present quantitative and visual comparisons between Talairach, mu- 
tual information-based volumetric matching and RPM on five subjects’ 
MR images. 



1 Introduction 

The recent development of brain imaging technologies (PET, MRI, fMRI) has 
provided rich information on the human brain. A potentially fruitful emerging 
area of research is human brain mapping which requires a comprehensive 
statistical analysis of brain structure and function across diverse populations 
and different imaging modalities. A major requirement in brain mapping is that 
the imaging data from different subjects and modalities have to be placed in a 
common reference frame. Recent efforts have focused on using anatomical MR 
as the basis for such registration. 

Inter-subject anatomical registration is a difficult task due to the complexity 
and variability of brain structures. This is most obvious in the cortical regions. 
The folding of the cortical surfaces - the sulci and gyri - vary dramatically from 
person to person and, in some cases |2S| are not even always present in each sub- 
ject. However, the folding pattern is not completely arbitrary. In fact, the sulci 
often serve as important cortical landmarks. Furthermore, many cortical areas 
have been associated with critical brain functionalities (vision, language, motor 
control etc.) with the sulci often representing important functional boundaries. 
Cortical registration, despite its enormous difficulty, is hence highly desirable as 
a basis for further statistical quantitative analysis. 

A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. 1 SS- TT^ 1999. 
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Our approach is based on matching feature points representing the sulcal 
structures. The points were obtained using a tool ini which allows a neu- 
roanatomy expert to interactively trace sulci on a 3D skull-stripped MRI brain 
volume. In contrast to just choosing a few landmarks, the tool allows us to rep- 
resent sulci using hundreds of 3D points. Also, major sulci can be identified and 
easily labeled. 

We then match two sets of labeled sulci (extracted from two subjects’ MRI) 
using a robust point matching (RPM) algorithm m- The method first deter- 
mines the best global 3D affine transformation that brings the two sets of sulci 
into register. Then piecewise affine transformations are solved for each sulcus 
to further refine the registration. Afterward, a linearly weighted 3D volumetric 
warping is generated from the piecewise affine mapping. 

RPM has been previously developed and used for 2D rigid alignment m and 
2D affine warping CH For the first time, we have developed the technique for 
3D affine and piecewise affine warping and applied it to real 3D sulcal features. 
Embedded within a deterministic annealing scheme, RPM allows us to jointly 
estimate the spatial mapping (affine, piecewise affine) and the point-to-point 
sulcal correspondences. Moreover, some sulcal structures in one subject may not 
have corresponding homologies in the other. RPM is able to reject a fraction of 
such non-homologies as outliers . Unlike other methods of point feature registra- 
tion, RPM returns a one-to-one correspondence between sulcal points. Except 
for the extraction of the sulci, the whole process is done automatically and the 
registration and warping of one pair of brains only takes a few minutes. 



2 Review 

There are two principal approaches to non-rigid brain registration: voxel-based 
methods and feature-based methods. 

Voxel-based approaches try to find the optimal transformation such that a 
local image intensity similarity measure is maximized. Most methods in this 
class allow highly complex transformations which are normally proportional to 
the size of the volume. Elastic media models, viscous fluid models Pj or local 
smoothness models |S| are introduced as constraints to guide the non-rigid spa- 
tial mapping. From these efforts, the need for non-rigid transformations is by 
now quite clear. Note, however that these algorithms are driven by local voxel 
intensities. Each voxel is treated equally without taking advantage of higher 
level geometric information (such as the sulcal and gyral patterns used here). In 
these methods, further anatomic validation is necessary to ensure that homolo- 
gous sulci are indeed matched. Aware of this lack, landmarks were used as an 
initial step in As p| also pointed out in their recent work, the voxel inten- 
sity approach worked well for deep subcortical structures, but sometimes had 
difficulty aligning sulci and gyri. To correct this, in their recent work, |n| used a 
chamfer distance measure jSj on sulcal points and combined it with their former 
voxel-based matching (ANIMAL) framework. All of these efforts are attempting 
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to incorporate neuroanatomic geometric feature expertise into their registration 
engines. 

Feature-based methods, as the name implies, capitalize on the information 
from different identifiable brain structures. Features which represent important 
brain structures are extracted. The features run the gamut of landmark points 
P], curves or surfaces [24fiSj . Subsequently, these methods attempt to solve 
for the correspondence and transformation between the features. The spatial 
transformations resulting from feature matching are then propagated to the 
whole volume. Underlying the philosophy of feature matching is that homol- 
ogous features always provide an effective anchor for registration. However, en- 
thusiasm for these methods is usually tempered not only by the difficulty of 
feature extraction but also by the difficulty of simultaneously determining the 
correspondences or homologies and the spatial mapping. The first problem - 
feature extraction - usually calls for some residual manual intervention while 
the second problem - automated matching - involves the computationally de- 
manding task of determining the correspondences and the spatial mapping. As 
our method basically belongs to this category, we discuss previous methods in 
some detail and compare them to ours. 

Bookstein | 2 ] pioneered the usage of landmark points for registration and 
shape analysis. The thin-plate-spline is used as the spatial mapping between the 
two landmark point sets to generate a elastic transformation in which the bend- 
ing energy is minimized. Since this method basically relies on a few landmark 
points, the accuracy of their locations is essential. The homologies between all 
landmark points is deemed known (in advance). In contrast, in our approach, 
the correspondences and the spatial mapping are co-determined from hundreds 
of sulcal feature points. In addition, the anatomical variability between subjects 
can create many outliers, i.e., sulcal points which do not match. Since we are us- 
ing hundreds of points to represent the structural information, it is statistically 
much more robust and the noise or point “jitter” which may be caused by various 
sources such as the tracing process or sampling error, should not significantly 
affect the final result. 

In PH , 3D active surfaces are used to extract the surfaces of lateral ventri- 
cle and outer cortex which are developmentally fundamental for the brain. An 
initial surface is first constructed from some fiducial points and is then relaxed 
towards the edges until a final balance is reached between the edge attraction 
force and the surface smoothness measure. To better represent the deep corti- 
cal structures (sulci), parametric mesh surfaces are also interactively extracted. 
A point-to-point mapping between the two surfaces is then calculated and a 
linearly weighted 3D volumetric warping is generated. [B| has a similar frame- 
work where surface curvature maps at different scales are used for different brain 
structures. More consideration is given to the inhomogeneity within the brain. 
A more sophisticated elasticity model makes the algorithm more flexible at the 
ventricles and more powerful to account for some abnormal cases where, for ex- 
ample, tumors are involved. Both methods emphasize the importance of sulcal 
alignment and not surprisingly, the validation in m has shown that anatom- 
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ically homogeneous points can be accurately aligned. As our method is based 
on matching cortical structures, it is quite similar to both of these approaches. 
However, we use point-sets as a representation for the sulci rather than surfaces. 
The major sulci are labeled which imposes strong constraints on the match- 
ing. Moreover, the non-rigid matching of 3D surfaces (parameterized by surface 
normals for example) is a difficult problem. The parameterization of cortical 
structures as point-sets allows us to easily utilize Procrustes methods of shape 
analysis PM (by equating the atlas with the Procrustes mean) . Eigenanalysis 
of the error covariance matrix (around the Procrustes mean atlas) also yields 
valuable information regarding the dominant modes of deformation present in a 
population 0. 

We have presented a detailed review of competing approaches to solving 
point correspondence problems elsewhere [2Uj . Here we briefly discuss chamfer 
distances and the iterated closest point (ICP) matching algorithm [Q. The 
chamfer distance has been used in cortical registration by [221 and 0. The main 
problem with the chamfer distance is that it uses a brittle nearest neighbor 
measure to assign correspondence. Nearest neighbor methods used in chamfer 
matching and ICP are problematic in the vicinity of outliers since they generate 
local minima m- Unlike the chamfer matching in 0 where a distance image 
is calculated from the Euclidean distance from each voxel to its nearest sulcal 
point feature, we directly use the sulcal point feature locations for the match- 
ing. Finally, we should mention the work presented in m where a maximum 
clique approach is taken to matching relational sulcal representations. Maximum 
cliques is a very difficult NP-complete problem m which in this case increases 
the likelihood of getting stuck in local minima. Also, it is difficult to explicitly 
model non-rigid spatial mappings in the maximum clique approach m- Conse- 
quently, the “engine” that does the work has to be pure sulcal correspondences 
making the problem more difficult. 

3 Robust Sulcal Matching 

3.1 Softassign and Deterministic Annealing 

There are two important factors that make RPM different from other point 
matching methods. These two factors mostly account for RPM’s robustness, 
which proved to be well suited for matching of the complex sulcal patterns. 

The first is the softassign technique. Let’s suppose we have two point sets 
{Xi, i = 1,2, . . . ,Ni} and {Yj,j = 1,2,... , N 2 }, where Wand W are the num- 
bers of points in each set. (W = ■ we are using homogeneous 

coordinates with a 4x4 affine spatial mapping so that the whole transformation 
could be simply written as AXi .) The point matching problem is then equivalent 
to solving the following optimization problem: 

Ni N2 Ni N2 

min E{M, A) = min £ £ Mij \ \X, - AY, 1 - a ^ ^ M^- (1) 

i—1 j — 1 i—1 j — 1 
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subject to: = l,Vj S {1, . . . ,iV2},Ef=i^' = l,Vi G {1, . . . ,iVi}, 

where G {0,1}. Matrix A represents the set of transformation parameters 
we are trying to solve. M is the binary correspondence matrix EHim with an 
extra row and an extra column introduced to account for outliers. The second 
term in © controls the degree of robustness. Greater the value of a, less points 
are rejected as outliers and vice-versa. 

Obviously, the transformation parameters, represented by A, belong to the 
set of continuous variables; on the other hand, the correspondence matrix M is 
binary. The soft assign technique provides a way to solve the optimizaton problem 
with two such variables of different natures. Instead of forcing Mij to be binary, 
we relax it to be continuous in the interval [0, 1], but with the row and column 
sum constraints still intact. In addition to being just a numerical technique, it 
also gives us a new way of treating correspondence. Now, one point does not 
necessarily just correspond to only one other point; it could have multiple mem- 
berships with all others with one membership being much larger than the rest. 
This property is clearly desirable if you have one point in one set lying in between 
two points in the other set. It does not have to choose immediately which one 
it belongs to but instead keeps a degree of “fuzziness” while preferring the clos- 
est one a little bit more. This also suggests that during the registration process 
when the transformation is optimized gradually, the correspondence member- 
ships would change continuously and gradually as well without jumping around 
in the space of permutation matrices (and outliers). In more formal terms, mak- 
ing the correspondences fuzzy smoothes the energy function ridding it of poor 
local minima PI- The fuzzy correspondence matrix still has to satisfy the row 
and column constraints. It turns out that the Sinkhorn balancing procedure of 
alternating row and column normalizations is an ideal vehicle to satisfy the row 
and column constraints m- The softassign essentially keeps all correspondences 
positive and then uses Sinkhorn’s theorem to ensure that all rows and columns 
sum to one (except for the outlier row and column). 

Another classic point matching method is the ICP algorithm jllhj . ICP uses a 
nearest neighbor heuristic to set binary correspondences. The algorithm iterates 
between the spatial mapping and the nearest neighbor correspondences until 
convergence. As in the chamfer distance 0, the brittleness of the nearest neigh- 
bor measure can in many cases create local minima m- Some efforts have been 
made to improve ICP’s robustness by including an adaptive thresholding 0. 
Also, there is no guarantee that ICP will return one-to-one correspondences. 
While correspondence does not have to be a pre-requisite for registration, it 
does play a more significant role in the creation of probabilistic atlases ; the 
atlas formation step requires averages and covariance matrices to be computed 
over all the corresponding points in a training set. We expect the one-to-one 
correspondence returned by RPM to play a significant role in the formation of 
probabilistic atlases. 

Deterministic Annealing [2Yj is the other important technique used in RPM, 
which is a good companion to softassign. It is closely related to simulated anneal- 
ing except that all operations are deterministic. The temperature parameter T 
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in deterministic annealing specifies the degree of fuzziness of the correspondence 
matrix - the higher the temperature, the greater the fuzziness. At each temper- 
ature, the initial condition from the previous temperature is used and a straight- 
forward deterministic descent is performed on the energy function. The process 
is repeated at lower and lower temperatures until M becomes almost binary. 
The method is more robust than classical gradient methods in that more config- 
urations are allowed at higher temperature, and this makes the energy function 
smoother and less vulnerable to local minima. At very low temperatures, RPM 
is very similar to ICP with the added benefit of one-to-one correspondence. 



3.2 The Spatial Mapping 3D AfRne Transformations 

With the above background regarding softassign and deterministic annealing in 
place, it is reasonably straightforward to develop the method for a 3D affine 
spatial mapping. The complete form of the energy function is: 

min maxiflA, M) 

M,A M y 

N^,N2 N^,N2 

= min max{ E Mij\\Xi — {A + I)Yj\\^ + A trace(A^A) — a E Mij 

'i'J 

Ni ^2 .^1 + 1 

+ E E “ 1) + E E “ 1) 

i 3 i i 

A^i,iV2 

+T ^ Mi,(logM,,-l)}. (2) 

Even though there are six terms, only the first two will be directly involved 
when we are going to solve for the transformation A (actually A+I where I is the 
identity transformation). The transformation A is now in 3D. The first term is 
the error measure. Assume for the moment that the correspondence M is known. 
The second term is the regularization on A. Basically we are assuming that the 
affine transformation should be close to identity. The degree of deviation from 
identity depends on A. Typically, we begin with a high value of A and quickly 
decrease it, with the consideration being that at first the correspondences are 
still far from the right answer and the transformation should not be too com- 
mitted. Though this may add some complexity to the algorithm, we have found 
it worthwhile for two reasons. The first is that the algorithm does not seem to 
be very sensitive for slightly different choices of A annealing schedules, i.e. as 
long as the starting value is high enough so that the transformations are not too 
large in the beginning and the final value small enough so that the transforma- 
tions won’t always forced too close to identity. The second reason stems from an 
observation that because of the extra constraint we put on the transformation, 
we could then choose not to use that robustness term — 

Actually, in all our experiments a was set to 0. 
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With M held fixed, the energy function w.r.t. A is: 



Ni,N2 

^affine(4l)|M = Mij\\Xi — {A + I)Yj\\^ + \ tvace{A^ A) (3) 

ij 

which is a standard least squares problem for the matrix A . By taking the 
derivative = 0, we can get the closed-form solution for A. 



Ni ,N2 Ni ,N2 

A = [ £ - Y,Y^)] ■ [ £ M,,Y,Y[ + XI]~^ = P-Q. (4) 

hi hi 



We will briefly describe the solution for the correspondence mainly for the 
sake of completion. The fourth and fifth terms are the row and column con- 
straints expressed via Lagrange parameters. The Sinkhorn algorithm within the 
softassign process will automatically satisfy these constraints so we do not need 
to explicitly solve for the Lagrange parameters fj-i and Vj m- The sixth term is 
an entropy term which can also be regarded as a barrier function HH. Solving 
for Mij (keeping the Lagrange parameters Hi and Vj fixed), we get: 



M,, = e 't 



( 5 ) 



Having specified both the spatial mapping in 0) and the correspondences in 
m, we summarize the algorithm in the following pseudo-code. 



The Robust Point Matching (RPM) Algorithm 



Initialize M, T, A, A 

Begin A: Deterministic Annealing. Do A until T < Tflnai 



Begin B: Softassign and Relaxation. Do B until M converges or # of 
iterations > Jo 



Qij ^\\Xi-{A + I)Yjf -a 
Mij ^ exp(-%t) 

Begin C: Sinkhorn. Do C until M converges or # of iterations > Ii 
Mij ^ - ^ 1^2+1 (row normalization) 

Sj_l Mij 

Mij « Nj+i (column normalization) 

A].-— 1 Mii 



End C 

^ MijiXiY^ - YjY,^)] ■ MijYjY,^ + XI]- 

End B 

T ^ T * Tanneal-rate 
A ^ A * Aanneal — rate 



End A 
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3.3 Global/ Piecewise AfRne Registration and Warping 

Given two brains’ sulcal point-sets, the registration is done in two steps. The first 
step finds the global affine transformation to account for the overall translation, 
orientation, scale and skew. After that, we further allow each sulcus to move 
locally to refine the alignment by solving for a piecewise affine transformation 
for each of them. To make sure that the sulcus only does local adjustment, 
the regularization is increased compared to the first step so that only small 
transformations are allowed. 

We then tried to propagate the transformations found for the sulcal points to 
the whole 3D volume to generate a 3D warping. A weighted linear combination of 
all the sulci’s piecewise affine transformations is calculated based on the shortest 
distance between a voxel and each sulcus. More specifically, we have a total 
number of N sulci with each of them (nth) denoted by a set of points, {xj^\ I = 
1,2,...} and a set of affine transformations , n = 1, 2, . . . ,7V. For the current 
voxel yjjfe (other than the sulcal points locations, where the transformation is 
unknown and need to be calculated ), the shortest distance to the nth sulcus 
is found, = min; \ \Yijk — , n = 1, 2, . . . , TV. A set of weights is then 

defined as: 



w 



(n) _ 
ijk 






2^n=l 



( 6 ) 



and the final voxel transformation is the weighted summation of all This 

is done for each voxel to warp the entire volume. 



Aijk 



E4”b-4'”> 



n—1 



(7) 



4 Experiments and Results: 

The sulcus tracing was done on an SGI graphics platform Id with a ray-casting 
technique that allows drawing in 3D space by projecting 2D coordinates of the 
tracing onto the exposed cortical surface. A screenshot of the tool is shown 
on the left in Fig. E The inter-hemispheric fissure and 10 other major sulci 
(superior frontal, central, post-central, Sylvian and superior temporal on both 
hemispheres) were extracted as point features. A sulcal point-set extracted from 
one subject is shown on the right in Fig. D 



4.1 RPM Applied to Sulcal Point Sets 

The original sulcal point-sets normally contain around 3,000 points each. The 
point-set is first sub-sampled to have around 300 points by taking every tenth 
point. The original MRI volume’s size is 106(X) x 75(Y) x 85(Z, slices). With 
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Fig. 1. Left: A screenshot of the sulcus tracing tool with some traced sulci on 
the 3D MR brain volume. Right: Sulci extracted and displayed as point-sets 



that in mind, it is reasonable to assume that the average distances between 
points before registration should be in the range of 10 - 100. We set our starting 
temperature to be roughly in the same scale. After registration, we would expect 
the average distance between corresponding points to be within a few voxels (say, 
1 - 10). Our final temperature should be slightly smaller. From these consider- 
ations, we set the RPM annealing parameters to be the following: Tinit = 50 , 
Tfinai = 1 , Tanneai-rate = 0.95 . The regularization parameter A is set to force 
A to be small at first. We use the value of Amu = maxy[Pij] (P, as defined in 
®) and decrease it by Aanneai-rate = 0-8 at the end of every temperature itera- 
tion. As mentioned above, the idea is that the regularization should prevent the 
affine to be over determined by the initial fuzzy correspondence at first; once 
the algorithm starts moving towards the right correspondence, which usually 
happens within the first few iterations, the regularization should be relaxed by 
decreasing A faster than the temperature. Actually, we observed that normally 
any annealing rate between 0.7 and 0.9 works quite well. 




Fig. 2. Demonstration of the robust point matching process. Left four: 3D point 
sets and their three 2D projections in the middle of the matching. 3D point 
sets are shown as circles and crosses. Their most significant correspondences are 
shown as dotted lines. Right four: Towards the end of the matching 
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Figure 121 shows one example of RPM in action. The circles and crosses stand 
for two sets of sulcal points and the gray links indicate the most significant 
correspondences {{Mij > ^ or ^)) between the two point-sets at that moment. 
The first is taken in the middle of the registration procedure in which clearly 
one can see that correspondence is still “fuzzy” . The second is taken towards the 
end of the process and the correspondence is close to binary so fewer links are 
seen. 

4.2 A Comparison between Talairach, Voxel-based matching and 
RPM. 

We applied RPM to five sulcal point-sets and compared it with two other meth- 
ods which also use affine (and piecewise-affine) transformations for brain regis- 
tration. 

As mentioned in the review section, we suspected that the voxel-based meth- 
ods’ performance would not be as satisfying as feature-based methods for sulcal 
alignment. To test this, we compare RPM with a voxel-based affine matching 
method 12111 which maximizes the mutual information between the two volumes. 

By defining a common coordinate system, the Talairach method is a standard 
technique for brain alignment. A piecewise affine transformation is applied to 12 
rectangular regions of brains defined by landmark points of anterior and posterior 
commissures and extrema of the cortex. We used the Talairach program available 
as part of the MEDx package (from Sensor Systems Inc.) to align 5 brains. Sulcal 
points were traced on the resulting brain volumes. 

The volumes were then matched by the voxel-based method described in 
m and the resulting spatial mapping was applied to the sulcal points. RPM 
was separately run on the sulcal point-sets, and both the result from a simple 
global affine transformation and piecewise affine transformations are shown in 
Fig. 0 Since we register every brain to the first one, after registration, the 
minimum distance from each sulcal point in the current brain to the first is 
calculated. The mean and variance of such minimum distances for each sulcus 
is calculated for quantitative comparison. The results are shown in Fig. 0 The 
above comparison of Talairach with RPM and voxel-based approaches shows that 
RPM can significantly improve upon Talairach in most cases even though it may 
have less degrees of freedom. The voxel-based method’s performance is mixed; 
it gives bigger errors for 5 of the 11 sulci. The significant improvement from the 
global affine transformations by allowing piecewise transformations confirmed 
the belief of the importance of non-rigid transformations. 

4.3 3D Warping and Comparison. 

The three dimensional warping of the brain volumes is calculated from the trans- 
formations found for the sulci as described above. The insufficiency of the Ta- 
lairach alignment for the sulcal structure is clearly seen. Even though our warping 
strategy based on piecewise affine transformations is quite simple, the results 
show further improvement upon the global affine transformation. 
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Fig. 3. Sulcal points alignment. The alignment results of five brains’ sulcal point- 
sets on the left side of the brain are shown together. Denser, closely packed 
distributions of sulcal points suggest that they are better aligned. We clearly 
see the improvement of RPM over both Talairach and the voxel-based approach, 
especially for RPM with the piecewise affine mapping 

Two subjects’ brains as well as some of their major sulci are shown in Fig.EI 
The variability of the sulci can be appreciated from this figure. 

By displaying the reference brain’s sulci on the other brain volumes warped 
using the Talairach technique, global affine transformation from RPM and piece- 
wise affine transformations from RPM, we can see the improvement of sulcal 
structure alignment by our method. We should also note that the better align- 
ment is accompanied by increased degree of brain deformation. These are shown 
in Figs. 0 0 andIHl 

5 Discussion and Conclusion 

Our simulation and experiments with real data indicate that sulcal point match- 
ing is a fast, robust and accurate tool for the registration of cortical anatomical 
structures. We now mention several enhancements that could further improve 
our point feature-based non-rigid registration. First, statistical shape models 
can be computed using the correspondence information returned by RPM. From 
these models, more meaningful deformation modes based on principal compo- 
nents can be constructed. Also, an arc length-based ordering of the points (akin 
to curves) can be imposed. This would have the effect of radically reducing the 
correspondence search. Finally, using a mixture model m we can extend the 
matching algorithm to the problem of matching a labeled sulcal atlas to an un- 
labled or partially labeled sulcal point-set. This would allow us to automatically 
label the sulci extracted from a new brain image. We have reported preliminary 
work on matching labeled point-sets to unlabeled features (though not sulcal 
point features) elsewhere PEI- 

Since we are using point-sets, which are quite flexible - i.e. it does not matter 
if those points all lie on a curve, a surface or a more complicated geometrical 
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Fig. 4. The minimum distance measure (as described in text) for each sulcus 
(sulcal label: 0 - interhemispheric fissure; 1,2 - central sulcus; 3,4 - Sylvian fissure; 
5,6 - superior temporal sulcus; 7,8 - post-central sulcus; 9,10 - superior frontal 
sulcus. ) Top figure shows the comparison between Talairach (dashed line), voxel- 
based method (light dotted line) and RPM with a global affine transformation 
(solid line) . Bottom figure shows the first two again with results of RPM with 
piecewise affine transformation (light line) 




Fig. 5. Two brain volumes after Talairach alignment with their sulci are shown 
here. Left to right: Both sides of one brain A, both sides of another brain B 



object - deep cortical structures (for example representations like sulcal ribbons 
pg) can be easily incorporated into our framework. Future work will focus 
on hierarchical (labeled and unlabeled) point-set representations of the cortical 
structures. 
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Abstract. This paper builds upon our previous work on elastic registra- 
tion, using surface-to-surface mapping. In particular, a methodology for 
finding a smooth map from one cortical surface to another is presented, 
using constraints imposed by a number of sulcal and gyral curves. The 
outer cortical surface is represented by a map from the unit sphere to the 
surface which is obtained by a deformable surface algorithm. The sulcal 
and gyral constraints are defined as landmark curves on the outer corti- 
cal surface representation. The unit sphere is then elastically warped to 
itself in 3D using the predefined sulcal and gyral constraints, yielding a 
reparameterization of the original surface. This method is tested on MR 
images from 8 subjects, showing improved registration in the vicinity of 
the sulci used as constraints. We also describe a hierarchical framework 
for automating this procedure, by using conditional spatial probability 
distributions of cortical features on the spherical parametric domain, in 
order to automatically identify cortical features. This approach is demon- 
strated on the central and precentral sulci. 



1 Introduction 

Deformable registration has received a great deal of attention by the brain imag- 
ing community in the past decade Finding a spatial trans- 

formation that morphs one brain to another is important in several applications, 
including computational anatomy mi2i, functional image analysis H3|, and 
image guided neurosurgery. 

The several methods that have been proposed in the literature can be broadly 
classified into image-matching and feature-matching methods. The former are 
based on the assumption that two images to be matched have similar signal 
characteristics. Accordingly, these methods look for transformations that max- 
imize some measure of overlap or similarity between the transformed and the 
target images m- Feature-based approaches utilize distinct anatomical features 
that are first extracted from images, in order to find the morphing transforma- 
tion pil8IVI8iyilO| . We have previously reported a method that uses open or 
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closed surfaces as features that drive a 3D elastic transformation |7]; similar ap- 
proaches have been pursued by other investigators mm- Features can be the 
boundaries of brain structures, or the cortical sulci, which can be modeled as 
thin convoluted ribbons embedded in 3D m 

One of the important issues that needs to be considered in a surface matching 
paradigm is that there is an infinite number of possible ways in which one surface 
can be mapped to another. In the context of deformable registration, however, 
only the map that preserves anatomical homologies is meaningful. In this paper 
we present an approach for defining such a map, using distinct features of the 
cortex (Sect. 2). Moreover, in Sect. 3 we describe a framework for automating 
this procedure, by using spatial probability distributions of cortical features on a 
spherical reference domain, in conjunction with geometric properties of an indi- 
vidual’s cortical surface, in order to automatically identify these features. In this 
paper we restrict our attention to the outer cortical surface. Our methods, how- 
ever, are applicable to any anatomical surface that is parameterized on the unit 
sphere. Of particular interest is the application of our methods to the problem 
of spatial normalization of the whole cortical surface, the accurate extraction 
of which is still an open research problem |lllIbll6IIV| . in order to normalize 
structural and functional data to a common reference system. 

Using prior spatial distributions for identifying cortical sulci has been recently 
shown to be a promising approach j1 8| . The method presented in m was based 
on 3D spatial distributions of the sulci, after an overall shape normalization 
of the corresponding brain images via a 3D linear transformation. The work 
we present herein is similar in nature, but it differs in three respects. First, our 
spatial priors pertain to curves that belong to surfaces parameterized on the unit 
sphere, and are therefore applicable to any surface matching paradigm using the 
unit sphere as parametric domain. In addition to reducing the dimensionality 
of the problem by one dimension, we believe that our approach might turn out 
to produce tighter spatial priors, since the fact that we work in the parametric 
domain implies that variability in the overall shape of the brain is factored out. 
Moreover, in Sect. 3 we propose a hierarchical approach, which starts with the 
identification of the major and less variable cortical features, and then proceeds 
with more variable ones. Effectively, our method removes a certain degree of 
variability at each stage of the hierarchical matching. Our assumption, which is 
tested on variability measurements gathered from the precentral and central sulci 
of 20 subjects, is that having identified the location of, e.g. the central sulcus, 
gives us a better idea of where to look for the precentral sulcus, as opposed to 
looking for the precentral sulcus directly. 

2 Elastic Registration 

2.1 Overall Framework 

In PI we treated the problem of finding a map from one surface to another as 
a problem of finding an elastic reparameterization of one of the two surfaces, 
so that the geometric structures (quantified by the principal curvatures) of the 
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two surfaces are similar on surface points with the same parametric coordinates. 
That approach has been tested on a large sample of 250 images m. and it 
has demonstrated an overall good registration. However, accurate registration 
of individual sulci or gyri could not be achieved. This is due primarily to the 
complexity of the cortical structure. In particular, because of the convoluted 
nature of the cortex, we only used global geometric properties of the brain, in 
order to find a map from one surface to another. Typically, these global shape 
measures highlight structures such as the inter-hemispheric and Sylvian fissures, 
or at the tips of the temporal and occipital lobes. Figure QJi shows the gray 
matter and white matter distributions of 100 subjects after elastic warping to 
the Talairach space m- Achieving a better registration in the cortical region is 
the main goal of the work reported in this paper. 




(a) (b) 



Fig. 1. (a) Triplanar display of the average distribution of gray matter from 100 

normal subjects, after segmented images of these subjects were mapped to Talairach 
space. Individual cortical folds were not brought into perfect registration as reflected 
by the fuzziness in the average image, (b) The average distributions of gray and white 
matter for 8 subjects, after elastic warping using 9 landmark curves primarily in the 
left hemisphere (right in the images). Arrows indicate regions of good registration 
(low fuzziness) around landmark curves. Note the good registration around the central 
sulcus (top left, arrow) 



In particular, we first present a method for morphing one cortical surface 
to another, using a map between corresponding sulci and gyri; these are curves 
defined on a parameterized surface. Some investigators have previously used 
flattened representations of surfaces in order to find a morphing transformation 
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from one surface to another [ 7 ]. An issue that arises in the reparameterization 
of flat maps is singularities at corner points of the parametric grids or bound- 
ary constraints. In the approach we describe in this paper, we get around this 
problem by using an iterative procedure which consists of 3D surface warping 
steps followed by projections onto the parametric domain. Upon convergence, 
this procedure results in a reparameterization of a surface under constraints im- 
posed by sulcal or gyral curves. We now describe each step of our algorithm in 
further detail. 

2.2 Surface Construction 

We represent each surface by a map from the unit sphere to the surface. In 
this paper we focus on the outer cortical surface. A spherical map is obtained 
by shrink wrapping a deformable surface m, which is initialized at a spherical 
configuration. After convergence of the deformable surface to a configuration 
conforming to the outer cortex, we readily obtain a map from the sphere to the 
surface by simply following the trajectory of each point on the initial sphere. 

For the numerical implementation of the algorithm, the sphere is represented 
as a tessellated icosahedron. We typically start with 2,500 vertices in order to 
speed up the deformation of the deformable surface, and as the surface conforms 
to the outer cortical boundary, we increase the number of vertices by subdividing 
the triangles. The final resolution surface is sampled with with 40,000 vertices, 
and at convergence, each point on the tessellated sphere is mapped to a point on 
the outer cortical surface. This map will be denoted by x(u, u), where (u, v) is a 
pair of parametric coordinates on the sphere, such as longitude and latitude. 

2.3 Curvature Estimation 

The geometric structure around each point of the triangular grid of a surface is 
determined via a least-squares estimation procedure, which finds the bi-quadratic 
patch that best fits the shape of the surface in the neighborhood of a point 
El- The major difficulty in the least-squares estimation procedure is that the 
optimal size of the neighborhood used to estimate the parameters of the bi- 
quadratic patch are not known in advance. In order to optimally capture the 
local variations of the sulcal shape, while maximally smoothing out the noise, 
the optimal size of the neighborhood is found adaptively, as described in El- 
Although we do not impose any continuity constraints on the curvature estimates 
of neighboring vertices, the fact that the surface patches used to estimate the 
curvature on neighboring vertices overlap, for the most part, results in smoothly 
varying curvature estimates. Figure EtL,b show three-dimensional renderings of 
two outer cortical surfaces. Overlaid on the surfaces is shown one of the two 
principal curvatures as blue. 

2.4 Defining Constraints on the Surface 

The initial parameterization, x(u, u), of the surface depends on how the de- 
formable surface shrink-wraps around a particular brain boundary, and there- 
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fore it depends on initialization as well as on the shape of the individual brain. 
Therefore, there is no reason to expect that the parametric coordinates (u, v) on 
the unit sphere, 5, correspond to the same anatomically region in two different 
brains. Consider two different brain images, Ii and X 2 , and the corresponding 
outer cortical surfaces, xi(it, u) and X 2 (u,u), with 

xi:5^Xi, (1) 

and similarly for X 2 (-). A map from Xi to X 2 can be implicitly defined as a 
reparameterization, r{u,v), of xi(u,u). In particular if 

r : 5 9 (u, u) > r(u, v) G S , (2) 

then 

X : Ji 9 xi(r(u,u)) — >X 2 (u,u)gJ 2 (3) 

defines a map from one cortical surface to the other. Our goal here is to find 
the reparameterization, r(-), which brings the two cortical anatomies into good 
correspondence. That is, xi(r(u,u)) and X 2 (u,u) should be anatomically corre- 
sponding regions. 

We define this parameterization based on a number of landmark-curves on the 
sphere. In particular, let c{{l), and C 2 (Z), j = 1, . . . , K , I G [0, 1], be two sets of 
curves parameterized on the unit interval and positioned on S. These curves are 
parameterized by piece-wise constant speed parameterizations, i.e. their points 
are evenly spaced in-between break points along the curves. Typical curves we 
use are the inter-hemispheric fissure, the central, precentral, postcentral, supe- 
rior frontal, lateral, superior temporal sulci, or the ridge curves of the adjacent 
gyri. Examples of break points are the precentral knob, intersection points of 
sulci (e.g. precentral with superior frontal, central sulcus with inter-hemispheric 
fissure), or distinct points such as the tips of the temporal or occipital lobes. For 
the experiments of this section, the K pairs of curves are defined manually on 
three-dimensional renderings of the surfaces, using an OpenGl-based interface 
(see Fig. n. These curves are then mapped onto the sphere via the inverse of 
the maps Xi(-) and X 2 (-). In the following section we describe a framework for 
automatically defining these curves using prior probability spatial distributions 
in conjunction with geometric properties of the surface, such as its curvature. 

2.5 Surface Reparameterization 

By construction, the K pairs of curves provide point correspondences on the 
surface. In particular. 



v; G [0, i],c{(0 — . 

We use these point correspondences to elastically warp the sphere to itself, and 
therefore find the reparameterization function, r(u, u). In principle, this can be 
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formulated as a 2D transformation problem. However, in order to avoid difficul- 
ties introduced by discontinuities on singular points (e.g. on the two poles of a 
polar coordinate system) or along boundaries of flat maps, we have formulated 
this problem as a 3D problem, i.e. as a problem of warping S onto itself via an 
iterative procedure. Successive projections back on S warranties that the result 
is the desired two dimensional transformation, r(-) from S to S. This iterative 
algorithm is described in detail below. 

Let p(rt, v) be the position of a vertex of S at some time point during the 
iterative procedure. As we will discuss below, p(u, u) does not necessarily remain 
on S during our iterative algorithm, but is continuously projected onto S. Let, 
also, s(u,u) be the 3D unit vector of the point on the unit sphere that has 
parametric coordinates (u, u). Finally, let f(p(u,u)) be a force field defined for 
each vertex point (u,u) as follows: 

{ s(c 2 (/)) — p(u, u) , if (u,v) = c{{l), for some I G [0,1], 
j G ,K} , 

0 , otherwise . 

That is, f(-) is nonzero only on the K landmark curves. Let, also, e{u,v) be the 
sum of elastic forces applied from the neighbors of p(u, v). [Since the vertices of 
the grid on the sphere result from successive tessellations of an icosahedron, all 
but 12 points of this grid have 6 neighbors; the rest have 5 neighbors]. Then, we 
find the function r(rt, v) with the following iterative algorithm: 

p°(it, v) = s(it, v) 

p*+i(u, v) =P {p‘(w, v) + (5* [f(p‘(u, u)) -h Le(p*(u, u)] } (4) 

r(u,v) = s"i(p^(u,u)) 

where T is the maximum number of iterations, and V{} denotes the operator 
that projects a point radially onto the unit sphere. 

According to this iterative algorithm, the vertices of the unit sphere move 
in the three-dimensional space under the influence of attractive forces between 
corresponding curves, which are interpolated by elastic forces. As soon as a 
vertex moves away from the unit sphere, however, it is projected back on the unit 
sphere, and the algorithm continues until convergence. Convergence is achieved 
when the K curves in S that correspond to landmark curves in Ii are very 
close to (in the absence of elastic forces, they coincide with) their counterparts 
corresponding to l 2 - Convergence is generally achieved after 200 iterations. 

2.6 3D Elastic Warping 

After the surface correspondences are defined via the map x (see Q), Ti is 
elastically transformed to l 2 - This transformation (STAR) has been described 
in detail elsewhere |Z]. Since most of our research subjects are elderly individuals, 
we have adopted a framework of prestrained elasticity in order to account for 
ventricular expansion that is typical in these individuals. 
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2.7 Experiments 

In our first experiment we considered two MR images, and we outlined several 
sulci, as shown in Fig. 0 The maximum curvature is shown in gray, in the 3D 
renderings of the outer cortical surface. The positions of the outlined curves on 
the unit sphere are shown in the Fig. H;, with white corresponding to Fig. 133 and 
black to Fig.|3i.. We then elastically reparameterized the surface of Fig. |33. The 
new positions of the white curves, together with the target (black) curves are 
shown in Fig. Eli- The grid is overlaid on the two renderings of the unit sphere 
in order to appreciate how the curve deformation is interpolated in the rest of 
the vertices. 




(a) (b) (c) (d) 

Fig. 2. (a),(b) 3D renderings of two outer cortical surfaces, with landmark curves 

overlaid on them (white), (c) The position of the landmark curves in the parametric 
domain (the unit sphere). The curves corresponding to (a) are shown in black and the 
curves corresponding to (b) are shown in white, (d) An elastic reparameterization of 
the sphere, so that the two sets of curves have similar parametric coordinates 



We then applied this method to 3D MR images from 8 individuals. These 
images were first segmented into gray matter, white matter, and CSF, using a 
Markov Random Field method described in E2|. A deformable surface was then 
fitted to the gray matter/CSF boundary of each image. Based on the resulting 
surfaces, we then defined the following curves: inter-hemispheric and Sylvian 
fissures, central, precentral, postcentral, superior frontal, and superior temporal 
sulci, and the medial axis of the inferior aspect of the temporal lobe, for the left 
hemisphere only (right in the images, according to the radiology convention) . We 
used the first of the 8 images as the target image, and we applied our algorithm 
to reparameterize the remaining 7 images. Finally, we used the STAR algorithm 
to warp the 7 images into conformation to the target. 

In order to visualize the degree to which the warped images were in regis- 
tration, we calculated the average distributions of the gray and white matter, 
which are displayed in Fig.Dr. In Fig. [O^, relatively fuzzier regions imply a rela- 
tively poorer registration, whereas crisp regions imply a good registration, since 
in these regions cortical gyri are aligned and therefore averaged together. No- 
table is the very good registration in the vicinity of regions in which constraining 
sulcal curves were used. Those regions are marked with arrows in the triplanar 
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display; notable is the almost perfect registration in the precentral knob (upper 
left), which is thought to be the region controlling hand movement. Note that 
even though we only used sulcal constraints on one of the two hemispheres, a 
relatively good registration is also apparent in the contralateral sites. This is 
primarily due to the symmetry of the brain, and to the elastic forces. For ex- 
ample, if we correctly map the intersection of the left central sulcus with the 
inter-hemispheric fissure across individuals, then we are very likely to also map 
the intersection of the right central sulcus with he inter-hemispheric fissure, since 
these two points are identical under perfect symmetry. 

3 Hierarchical Labeling of the Sulci 

3.1 Overall Framework 

Our experiments in the previous section demonstrated that cortical constraints 
are important in bringing the highly variable cortical regions into registration. 
However, the manual definition of the sulcal curves is a laborious procedure (in 
the experiments of the previous section, defining the set of curves for each brain 
required approximately 25 minutes of a trained person’s time). Therefore, we 
have investigated an approach for automatically labeling major cortical features. 

Our approach is based on a hierarchical labeling of a number of landmark 
curves in the parametric domain, i.e. in the unit sphere S; the labels propagate 
to the sulcal curves that are embedded in 3D via the surface parameterization 
x{u,v). Labeling of the landmark curves is achieved by elastically matching a 
“template unit sphere” , which contains statistics of each landmark curve that are 
collected from a training set, to the unit sphere holding the parameterization of 
a particular brain’s surface. The statistics currently provide prior knowledge of 
the expected location of each landmark curve and its variability. This matching 
is done hierarchically. In the simplest case, at each stage of this hierarchical 
procedure, one landmark curve is considered only. 

One could, in principle, use the spatial distributions on S to label all cortical 
features simultaneously. However, the high variability of many cortical features, 
such as the folds of the prefrontal cortex, might be an impediment. Our reason 
for using a hierarchical scheme is that the variability of certain cortical features 
can be reduced if measured relative to other, less variable features. For example, 
the precentral sulcus is a relatively more variable feature than the central sulcus. 
However, if we know the location of the central sulcus, we can make a better guess 
as of where the precentral sulcus might be; this is shown quantitatively using data 
from a sample of 20 of our subjects in Sect. 3.2. Accordingly, in our hierarchical 
matching scheme, we use conditional spatial probability distributions of sulci on 
S, which we will refer to as CSPD’s. For example, assume that, in a particular 
subject, the central sulcus has been somehow labeled. Then, the outer cortical 
surface of that subject can be reparameterized as described in the previous 
section, so that its central sulcus has the same position on the unit sphere as the 
average central sulcus of the training set. The CSPD of the precentral sulcus, 
conditioned on the fact that the central sulcus is coincident with its average 
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in the training set, is presumably tighter, i.e. it has lower variance, and it can 
subsequently used for labeling the precentral sulcus. We now describe the details 
of this algorithm. We have focused on the central and precentral sulci, in order 
to better understand how our methods behave. 

3.2 The Spatial Probability Distributions 

A training set of 20 normal subjects was 
randomly selected from our database. Pa- 
rameterizations of the outer cortical surface 
were then found for these subjects, as de- 
scribed in Sect. 2.2. The central and precen- 
tral sulcal curves were then outlined for all 
subjects. This resulted in 20 pairs of curves, 
each parameterized in the unit interval by 
a piece-wise constant speed parameteriza- 
tion, as described in Sect. 2.4. For each I G 
[0,1], the corresponding sulcal curve point 
was assumed to follow a Gaussian distribu- 
tion, which was estimated via the mean and 
the covariance matrix. 

The spatial distribution of the precen- 
tral sulcus was calculated conditioned on the 
fact that the location of the central sulcus was known. More specifically, we 
reparameterized all 20 surfaces so that the central sulci were all aligned to their 
average position on the unit sphere. The precentral sulci were transformed ac- 
cordingly. Subsequently, the CSPD of the (transformed) precentral sulci was 
calculated. Figure 0 shows the 90%-thresholded regions of the central sulcus, 
and of the precentral sulcus after alignment of the central sulcus. 

In order to test the hypothesis behind the hierarchical formulation of the 
sulcal matching, we calculated the variance along each point of the precentral 
sulcus with and without aligning the central sulcus first. The resulting variances 
are shown in Fig. g|i. The reduction in the variance is clear, and it is due to the 
fact that the locations of the central and precentral sulci are correlated with each 
other. In Fig. Eb we show the variance along the superior frontal sulcus, before 
and after alignment of the central sulcus. As expected, the variance doesn’t 
change much, since the positions of these two sulci are not highly correlated. 

3.3 Hierarchical Labeling 

Consider, for example, the average shape of the curves on S that correspond to 
the central sulcus in the training set. We will loosely refer to this curve as the 
average central sulcus, having in mind that the true shape of the average central 
sulcus is actually found via the map from its average position on S to 3D. We 
achieve the labeling of an individual’s central sulcus by mapping S onto itself, 
so that the average central sulcus is mapped to the individual’s central sulcus. 




Fig. 3. The CSPD’s(90% regions) 
derived from 20 normal subjects, of 
the central sulcus (black), and the 
lateral portion of the precentral sul- 
cus (gray) conditioned on the align- 
ment of the central sulcus with its 
average 
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Fig. 4. Plots of the variance along the precentral sulcus (a), and the superior frontal 
sulcus (b) before (solid line) and after (dotted line) alignment of the central sulcus with 
a fixed curve (namely, the average of its spatial distribution) 



thereby transferring its label. This is accomplished via the procedure described 
in with the force field f(tt, v) being nonzero only on the central sulcus. More 
generally, a different curve (or set of curves) is considered at different hierarchical 
levels of our algorithm. Therefore, in general, the force field f(u, u) is nonzero 
on the curve(s) considered at a particular stage of the hierarchical matching 
procedure. 

In ( 0 , the target curves, 02(1), which determined the force field f(M,u), were 
predefined on S since they were manually drawn in advance. However, here, 
the target curve is not known in advance, but is calculated at each iteration. 
In particular, consider the hierarchical level in which the central sulcus is to 
be labeled in an individual’s surface. At each iteration in this stage, a search 
in the neighborhood of each point on the average central sulcus is performed, 
looking for points for which the subject’s surface has high curvature; for sulcal 
curves we use the minimum principal curvature, whose absolute value is high on 
the sulci, while for gyral curves we use the maximum curvature. The center of 
mass of the high curvature points is then calculated. The collection of the center 
of mass points forms the target curve, C2(l), at each iteration in ( 2 ). Clearly, 
as an average sulcus deforms towards its shape in an individual’s brain, the 
target curve, formed by the collection of the center of mass points, is reevaluated 
continuously. This mechanism is shown schematically in Fig. 

In equilibrium, the average central sulcus is mapped to the central axis of 
the subject’s sulcus, and the rest of the points on the unit sphere are mapped 
accordingly to some other location on the unit sphere. The inverse of this trans- 
formation is a reparameterization of the subject’s surface, for which the subject’s 
central sulcus has exactly the coordinates of the average central sulcus of the 
training set. In the subsequent stage of the hierarchical matching procedure, the 
precentral sulcus, which has been mapped to a new location during the previous 
stage, is used as the driving-force curve. The central sulcus is fixed to its average 
configuration. 
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precentral sulcus central sulcus curvature 




warping of average warping of average 
initial position central sulcus precentral sulcus 
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transfomiation 



(d) 



Fig. 5. A schematic representation of the hierarchical labeling of the sulci, (a) The 
average central and precentral sulci overlaid on the curvature map of an individual 
surface, (b) Stage during which the average central sulcus is mapped to the individuars 
central sulcus, via the center of mass forces originating from high curvature regions, 
as shown schematically by the arrow, (c) Stage during which the precentral sulcus is 
labeled, after alignment of the central sulcus, (d) The inverse transformation, which 
effectively reparameterizes the individual’s surface so that its central and precentral 
sulci have the parametric coordinates of the sample average 



We have used two different mechanisms for calculating the center of mass 
forces. First, we have weighted the center of mass force by the probability of 
the corresponding location, using the corresponding CSPD. Second, we use the 
CSPD only to define the maximum search window, but we don’t weight the forces 
by the value of the probability. The former mechanism is, in some cases, slightly 
more robust. However, the latter allows more flexibility in the transformation, 
and hence we have adopted it in our experiments. 



3.4 Experiments 

Figure 13^ shows the average central sulcus (bottom curve in white) overlaid on 
a subject’s surface; this curve was obtained by following the map x(-) of that 
subject, starting from the average parametric coordinates of the central sulcus 
on S. The lateral part of the precentral sulcus is also shown (top curve in white). 
FigureEb shows the same curves after reparameterization of the subject’s surface 
in the first level of the hierarchical procedure, in which the central sulcus was 
used as the driving curve. The result of the subsequent stage is shown in Fig.Efc, 
in which the precentral sulcus was used as driving curve. The corresponding 
transformations of the unit sphere, demonstrating the elastic reparameterization 
of the subject’s surface, are shown in Figs. EJl-f- 

One could, in principle, apply this mechanism without computing the inverse 
transformation, as follows. First, transform the unit sphere so that the average 
central sulcus is mapped to the central sulcus of the individual’s surface, as de- 
scribed above. Then do the same with the precentral sulcus, and so on. The 
difficulty in this approach would be in estimating how the probability distribu- 
tion of the precentral sulcus (and all the other sulci) is warped from its initially 
Gaussian form, as the surface itself is warped. In contrast, when mapping the 
individual’s curvature pattern to match the average sulci through the inverse 
transformation described above, the CSPD’s of all sulci remain Gaussian and 
easily computable. 
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Fig. 6. An example of the automated labeling of the central sulcus and the lateral 
portion of the precentral sulcus, (a) The curves having the parametric coordinates of 
the average central sulcus (bottom curve in each figure) and average precentral sulcus 
(top curve in each figure) on the unit sphere, which is shown in (d). (b) The same 
curves, after elastic reparameterization of the unit sphere, as shown in (e), using the 
central sulcus as driving curve, (c) The same curves, after the reparameterization of 
the unit sphere at the second hierarchical level (f), in which the precentral sulcus was 
used as driving curve 



4 Summary and Discussion 

We presented a methodology for deformable brain registration, which aims at 
improving registration accuracy in the cortical region, by using landmark curves 
such as the outer edge of a sulcus or a gyrus. In this work we have focused on 
the outer cortical surface. The landmark curves are used to find a map between 
two surfaces to be registered. 

We presented a framework for obtaining the surface reparameterization, 
which avoids problems introduced by singularities of the parametric domain, 
such as poles. In particular, we determine the reparameterization of a surface 
parameterized on the unit sphere by a sequence of three-dimensional elastic 
transformations followed by projections onto the unit sphere. 

We also presented a framework for automatically determining this surface- 
to-surface map via a hierarchical procedure using conditional spatial probability 
distributions. This procedure is based on the premise that the variability of a 
highly variable curve is a composite of its own intrinsic variability and of the 
variability of certain less variable curves. For example, the variability of the pre- 
central sulcus is a composite of its own intrinsic variability and of the variability 
of the adjacent central sulcus. Therefore, if the central sulcus is identified, then 
it is reasonable for one to look for the precentral sulcus in the vicinity of the 
central sulcus and at a certain distance from it. We use use conditional spatial 
distributions of the sulci in order to define the region in which to look for a par- 
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ticular sulcus, given that all the sulci in the previous stages of this hierarchical 
procedure have been identified. 

Our work on the automated identification of cortical curves is still at a pre- 
liminary stage. Several issues need to be addressed. First, we need to precisely 
determine strong correlations between sulci and gyri, which will define which 
landmark curves depend highly on others, and therefore it will determine the 
sequence in which these curves must be visited by our hierarchical procedure. 
Purely based on the development of the brain, one would expect that the most 
stable features are the inter-hemispheric and Sylvian fissures, which are formed 
in relatively early embryonic life. The central sulcus is one of the features ap- 
pearing next. Although the adjacent precentral and postcentral sulci do not 
necessarily follow, it is reasonable to consider them after the central sulcus in 
our hierarchical scheme, since their position naturally depends on the position 
of the central sulcus. 

The second issue to be investigated is a model for representing variability on 
the structure of landmark curves, in addition to our current model of variability 
in their position. For example, certain sulci are often interrupted. Moreover, parts 
of certain sulci tend to be interrupted more often than others. This information 
can be readily incorporated into our CSPD’s. For example, labels attached to 
each point on a sulcal curve, in addition to its average position on the unit 
sphere and its covariance matrix, can be the probability of being interrupted, 
its average depth POI, curvature and torsion m This information can help 
resolve ambiguities introduced by the high variability of the cortical morphology. 
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Abstract. This paper presents a series of 3D statistical models of the 
cortical sulci. They are built from points located automatically over the 
sulcal fissures, and corresponded automatically using variants on the Iter- 
ative Closest Point algorithm. The models are progressively improved by 
adding in more and more structural and configural information, and the 
final results are consistent with findings from other anatomical studies. 
The models can be used to locate and label anatomical features auto- 
matically in 3D head images for analysis, visualisation, classification, and 
normalisation. 



1 Introduction 

The aim of this work is to build statistical models of the cortical sulci from a 
set of example (training) images. These models can provide an insight into the 
biological variability present in cortical configurations and can be used in Active 
Shape Model (ASM) searches to locate and label the sulci in unseen images. 
Since many of the sulci demarcate functional areas 1291 . this provides a basis 
for labelling the cortical surface providing a standard frame within which to 
analyse functional and structural change in disease. The form of the statistical 
models and their incorporation into ASM search are described briefly in Sect. 3. 
The method requires that ‘landmark’ points are found for each member of a set 
of training images and that a one-to-one correspondence be established between 
these sets of landmark points for each pair of training images. Because the struc- 
tures are so complex, it was desirable to develop automated methods for finding 
the landmark points and establishing the correspondences. We have developed 
a simple data-driven method, described in Sect. 4, was developed for generating 
the landmark points over the mouths of the sulcal Assures. Automatic correspon- 
dence is based on the iterative closest point (ICP) algorithm [414 1 ) . Naive ICP 
gives poor results, but incorporating structural and configural information (Sect. 
6) results in significantly better models that capture forms of variability already 
known to be present in the configuration of the cortical surface |2| . Quantitative 
and visualisation results are given in Sect. 8 and examples of sulcal and cortical 
labelling using ASM search are shown in Sect. 9. 

A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. 1 96- l2(M 1999. 
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2 Background 

We are interested in methods for locating brain structures automatically. Many 
attempts to segment the cortex have relied on clustering (which tends to be 
more effective with multi-sequence data) 0 or image morphology |fil2B| . but 
these techniques only provide gross structure for visualisation and not informa- 
tion about the location of particular areas. Generally anatomical labelling has 
required an expert user to label structures of interest manually mm- Com- 
puterised anatomical atlases have been developed m, but these tend to be 
derived from as little as one subject brain. Such atlases usually deform to the 
new example using intensity information or a combination of prescribed 

transformations PC], sometimes user driven m These take little or no ac- 
count of the variation of the object class; the resulting labelling may therefore 
be unreliable. In order to extract and label structures automatically the infor- 
mation must be incorporated into the model used , as in where 

3D models of structures in the head were developed from a training set of ex- 
amples. In these cases the structures chosen were considerably simpler than the 
cortical surface, which meant that manual delineation and point correspondence 
was possible. Subsol et al. j,35l3tilj developed more complex 3D models of skull 
and cortical ridges from sets of examples. The ridges were detected and matched 
automatically to produce an atlas but the authors chose to use mathematical 
modal analysis based on physical structure rather than statistical observa- 
tion. Sandor & Leahy m developed a model which could locate and label a small 
number of sulci, but had no built in knowledge of their structural variations and 
configurations. Other authors have analyzed the sulcal variability with a view to 
labelling automatically or interactively m detected structures, but 

generally this has been restricted to a small number of major fissures or only a 
small number of manually labelled and corresponded examples are used |20|. The 
work presented here is fully 3D, uses automatically marked and corresponded 
data, and considers the whole of the exposed surface, building on earlier less 
complex experiments which used poorer data |^. 

3 Active Shape Models 

Active Shape Models (ASM) have as their basis the Point Distribution Model 
(PDM). PDMs have been used to model many classes of variable objects ranging 
from faces m to electrical components unj. Full details of the PDM can be found 
in nm but the following gives a brief description. Given a set of example pattern 
vectors {xieR^}, where correspondence is established between the values at each 
index of Xi, then each vector can be rewritten: 



Xi = x + Epi (1) 

where x is the mean pattern vector, E is the matrix whose columns are the 
eigenvectors of the co- variance matrix of the set, and Pi is an n-dim vector of 



198 A. Caunce and C. J. Taylor 



parameters describing the degree to which Xi varies from a; in a way described 
by the corresponding eigenvectors. Each eigenvector describes the way in which 
linearly correlated Xij move together over the set, referred to as a ’mode of 
variation’. New examples, not included in the training set, can be generated by 
manipulating the elements of p. To model objects in three-dimensions the {xi} 
are constructed using the co-ordinates of descriptive features of each example. 
The features must correspond to the same ‘points’ on each object. Given co- 
ordinates {xij,yij, Zij) at each feature j of object i, the shape vector is: 

— {^il ; Vili 5 ^i2 ; Ui2 5 -2'z2 5 ■ ■ ■ ? ^in : Uin j ^in) ■ (2) 

Appropriate features may be corners, edges, borders, surface patterns, etc. 
These can often be identified and corresponded by hand, but for extremely com- 
plex subjects, like the cortical surface, an automated approach preferable. A 
PDM can be adapted for search by recording grey-level intensity data in a 
neighbourhood around each point and progressively adapting the model shape 
through an unseen image until the neighbourhood matches are optimal (within 
model constraints) . The final positions of the model enable those specific features 
to be labelled (see Sect. 9). 

4 Automatically Placing Landmarks 

In order to generate PDMs, landmark points must be placed on significant fea- 
tures over a set of training examples. The sulcal fissures were chosen because 
they provide anatomical landmarks t?T[Ti| and can be used as a diagnostic aid 
p4^l . Since the cortical surface is extremely convoluted with complex sulcal con- 
figurations it was necessary to develop some automated process to do this. 
Inspired by volume visualisation techniques CHI, a projection method was devel- 
oped which locates points above the mouths of the sulcal fissures on the cortical 
envelope or hull. The images used were 22 full 3D acquisitions obtained on a 
1.5T machine with a 3D Fourier-transform spoiled-gradient-recalled sequence. 
Each has 124 slices, 1.5mm thick, at 256x256 resolution, optimised for good T1 
contrast. Firstly the brain is segmented from the skull semi-automatically using 
region growing in ANALYZE m- Using a closing operation the cortical hull is 
produced and the grey levels of the brain image are averaged along the surface 
normals up to a specified depth. The averaged intensity is then projected onto 
the hull. The intensity values can then be thresholded to leave a representation 
of the sulcal fissures on the hull which is finally thinned (2Z1 to produce the 
landmark points. Fig. Q] illustrates the projection approach and Fig. 2 shows the 
whole process. 

Once the points are obtained they can be allocated to curve segments. This is 
done automatically and curves are delimited by joins (more than 2 neighbours) 
and endpoints (1 neighbour). Fig. 01 shows a point set labelled in this way. The 
grey-level intensities were projected onto the hull to a depth of 5 units and the 
grey-level threshold was set at 0.8 standard deviations below the mean of the 
distribution. 
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Fig. 1. The aver- 
age grey-level in- 
tensity along the 
surface normal is 
projected on to 
the cortical enve- 
lope 




(d) (e) 



Fig. 2. The projection process: Segmented brain (a), cortical 
envelope (b), projection image (c), thresholded projection (d), 
thinned point set (e) 



5 Iterative Closest Point Matching 

The problem of establishing matches between the point sets falls into two parts: 
finding the global alignment of the points and then the specific point correspon- 
dences. The ICP algorithm [4|41] tackles both of these problems simultaneously, 
although the first rather better than the second. The algorithm is run through a 
series of iterations; at each step the closest points (Euclidean distance) between 
sets are found and then, based on these matches, one (or both) of the sets is 
brought more closely into alignment with the other by adjusting global pose 
parameters. Specifically, for this application, each training example is matched 
to a master example chosen arbitrarily. This is not an ideal method and future 




Fig. 3. Point set labelled 
with short curve segments 




Fig. 4. Point set labelled 
with long curve segments 
(Sect. 6.4.1) 
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work will attempt to use pairwise-tree matching M to eliminate the dominance 
of one set. The point correspondences are established in both directions and 
any matches outside a particular distance limit (2 standard deviations above 
the mean) are disallowed. The pose of the matched set only is modified to bring 



adjustment is sufficiently small or a specific number of iterations have been per- 
formed. In order to build the model, all allowed matches (in either direction) 
to each point on the master are averaged to produce matches for that set. Due 
to the distance limit imposed, not all points in each set will be included in 
the model. This basic algorithm produced a model with good shape descriptive 
properties but with poor configurational representation. I.e. simply matching the 
closest point has taken no account of the actual variability of structures (and 
their mutual configuration) between examples, therefore the model can readily 
represent shape variations but cannot accurately reproduce the variability of 
cortical patterns (see Sect. 6). To rectify this, certain modifications were made 
to the basic algorithm by incorporating local shape and pattern information into 
the matching metric, and by taking into consideration the branching and break- 
ing of structures. This improved the way the algorithm tackled part two of the 
matching problem: the specific point correspondences. 

6 Incorporating Structural and Configural Information 

6.1 Local Attributes 

There are several local attributes that are suitable for inclusion in some kind 
of similarity measure in order to find the ’closest’ point for the ICP. Over the 
course of the experiments presented here, items used were as follows. 

6.1.1 3D co-ordinates. Compared using Euclidean distance: 



where dfj is the distance between two points i on one set and j on the other. 
6.1.2 Surface normal. Considering angles between 0 and 180 degrees: 



where it i is the unit surface normal at point i and i5 is a small value designed 
to prevent division by 0, say le-5. 

6.1.3 Curve segment direction. The principal directions of the curve 
segments containing the points i and j are compared considering angles between 
0 and 90 degrees: 



it into alignment with the master m- This is continued until either the pose 
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where i is the unit vector principal direction. 
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6.1.4 Neighbourhood histogram. This is a 2D low-resolution represen- 
tation of the local configuration of sulcal fissures about a point, see Fig.0 The 
histogram is centred on each point and aligned with the direction of its associ- 
ated curve. The bins represent the number of points in a particular area of the 
local neighbourhood as projected from 3D into 2D. Since a curve effectively has 
2 directions and a point can belong (join) to several curves, each point can have 
many histograms associated with it. 




Fig. 5. The neighbourhood histogram about a point. The grid is centred over the point 
and aligned with the principal direction of its curve segment. The bins represent the 
number of points in that area of the neighbourhood 



They are compared by concatenating the rows into a ID normalised vector 
and using a dot product: 




1 + ^ 

maxki{ h h 5} 



( 6 ) 



where h is the normalised vector representing the histogram of point i. 
This assumes that corresponding points will have similar sulcal configurations 
around them. 



6.2 Curve Segments 

Although the basic ICP algorithm attempts to match corresponding points, there 
is no provision for ensuring that points from the same structures did correspond. 
To do this, the curve segment information was introduced, and curve matching 
was decided on a voting basis; i.e. after point correspondences were established 
(by whatever means) each curve is considered matched to the curve with the most 
point matches. After that, corresponding points between curves were established 
using simple Euclidean distance. 

6.3 Size Variation 

Not only is the variation in sulcal configuration extreme but each structure can 
vary in size. In order to ensure that points are correctly matched between such 
structures, they are aligned at their centres of gravity (COG), and then one is 
scaled to match the extent of the other. Fig. Elillustrates the process and it can 
be seen that this correspondence is superior to taking the closest point on the 
original structures. 
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Fig. 6. Matched structures are aligned and scaled before points are corresponded 

6.4 Combining Curves 

Due to the effects of breaking and branching in particular, a large structure 
can consist of several small curve segments as described in Sect. 4. The basic 
ICP algorithm is not affected by this but, once the concept of curve segments is 
introduced, it becomes an issue. One-to-one matching of curves using a voting 
scheme will in many cases fail to match some parts correctly, if at all, and two 
methods were introduced to combat this as follows. 

6.4.1 Joining curve segments. By examining the principal directions and 
proximity of curve segments, suitable candidates could be joined to form longer 
curves. The three angles between the axes and the line joining the COGs must 
all be close to 0, and the curves must be sufficiently close to allow joining, see 
Fig. Q The same point set of Fig. El is shown in Fig. 0with long curve segments. 

6.4.2 Matched linked sets. This method relies on linking curves after 
matching. For this scheme a chain of matches is established and all the included 
points on each set are considered as one structure for the purposes of establishing 
closest points. Figure El shows this process which should allow varying numbers 
of segments on the same structure to match successfully. 




Fig. 7. Testing for a valid join. 
The angles and distance are con- 
sidered. (It is unlikely these two 
curves would be joined) 



Fig. 8. Matched linked 
sets. A chain of curve 
matches is established 
(left) and all the points 
in each set are allowed to 
match as one structure 
to give uniform closest 
points (right) 
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7 Experiments 

Various combinations of the modifications described in the previous section were 
used to generate models. However, for the purposes of this paper we present the 
most significant. 

Method 1. Basic ICP (Euclidean distance - eqn (3)). 

Method 2. Adding normals and axes (6.1.2 & 6.1.3) and then matching 
curve segments (6.2). The distance between points i and j becomes: 



Method 3. As Method 2 but allowing matched linked sets (6.4.2) and scaling 
and aligning (6.3.1). 

Method 4. As Method 3 but using long curve segments (6.4.1) and incorpo- 
rating neighbourhood histograms of size 40 with a resolution of 8 pixels (6.1.4). 



8 Assessing the Models 

Since ground truth is not known, a self-contained method had to be developed 
to assess the models. A PDM represents an object class by combining the mean 
shape with a linear combination of the principal components of the variation 
over the set. Each principal component is one way in which all the points move 
together and is referred to as a ‘mode of variation’. Since these models represent 
the configurations of structures rather than isolated features, it is reasonable to 
assume that most neighbouring points should be connected. This means that 
they should move in a similar fashion within each mode. A measure was de- 
vised, therefore, which assesses the degree to which neighbours move together. 
A coherence value is calculated for each mode: 



where N is the total number of points in the model, is the number of points in the 
neighbourhood of point i, and the unit vectors are the displacement directions of 
the points. Fig. El shows the principle. In theory this can take values between 0 
and 1 but in practice will never reach the extremes. Fig. llPI shows the coherence 
values for the 4 methods described in Sect. 5 when a neighbourhood size of 0.05 
was used (the model is scaled to unit size). The Wilcoxon paired signed ranks 
test statistics give p<0.005 for all hypotheses that the coherence values from one 
method to the next are not improved. 




(7) 




( 8 ) 
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Fig. 9. Calculating Co- 
herence. The direction 
vectors of neighbours are 
examined for each point 



Coherence for NBH • O.OS 




Fig. 10. Coherence values for Methods 1-4 



It is clear from these results that adding in local geometric and topographical 
information has improved the matching metric and that the countermeasures to 
branching and breaking have been successful. It is also reasonable to assume 
that, as point correspondence becomes more precise, the number of points in the 
model may fall as structures become excluded on some examples. Tabled shows 
the number of points for each method and this supports that assumption. 



Table 1. The numbers of points in each model 



Method 1 


1 1 


2 


3 


4| 


No. Points 1 


1 7234 


6719 


6715 


6613 1 



Visual inspection of the models was particularly important and a method 
was devised of displaying the shape changes (movement normal to the surface - 
Fig. 11211 present in the model, and the pattern changes (tangential movement - 




Fig. 11. The labelled cortex of the unseen image of Fig. (right & left views) 
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Fig. [a. In all cases the mean shape is shown. The size of the point indicates the 
amount it moves and the colour indicates direction, for positive model parame- 
ters, as shown in the key. These directions are reversed for negative parameters. 
It is important to note that point sizes are relative within mode only. Table 0 
gives the percentage of variance explained by each mode of Method 4, these were 
similar for all the models. 



Table 2. The percentage variation represented by each mode from Method 4 



Mode 


% Variance 


Mode 


% Variance 


Mode 


% Variance 


1 


8.14738 


8 


5.01766 


15 


4.02958 


2 


7.85342 


9 


4.84648 


16 


3.94542 


3 


6.56275 


10 


4.64844 


17 


3.84594 


4 


6.41422 


11 


4.43208 


18 


3.54666 


5 


5.58816 


12 


4.36429 


19 


3.49159 


6 


5.33754 


13 


4.2875 


20 


3.18029 


7 


5.11439 


14 


4.2394 


21 


1.1068 



From Figs.lLila.ndnrnit can be seen that, as the matching method is improved, 
the emphasis moves from shape to pattern change and the model becomes visibly 
more coherent (neighbouring points show similar size and colour). In fact the 
asymmetrical pattern changes of modes 1 & 4 for Method 4, localised around the 
temporal lobe and sylvian fissure, agree with observations in other anatomical 
studies j2|. Also the shape change of Mode 2 in Method 1, and Mode 3 in Method 
4, shows a diagonal squashing which can be interpreted as the relationship of 
the two hemispheres to each other, or torque, which is an acknowledged source 
of variation. 



9 Active Shape Model Search 

Taking the labelling scheme of Fig. 0 the model from Method 4 was used to 
search an unseen original (unsegmented) image. The grey-level templates at 
each model point were derived from the unprocessed images of the training 
set. All the modes of the model were used since the sulcal configurations are 
extremely complex, and it can be seen from Table El that even the minor modes 
account for a substantial amount of variation. Figure^S shows the final position 
superimposed on the cortex. Ultimately anatomical labels will be attached to the 
model points. Similarly, since the fissures are used to demarcate cortical areas 
[21)141 )j the model can be used to locate these also. For example, the model was 
fitted to one of the original training examples using an ICP-like fitting process. 
At each iteration, the model points were modified (within the scope of the PDM) 
towards their closest neighbour on the target set. Using the point locations as a 
guide, the hull for that example was labelled for lobar regions. Using the model 
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Fig. 12. Modes 1-4 of Methods 1 (top) & 4. Showing Key: Red & Green indi- 
shape change, i.e. movement normal to the surface. cate opposite directions 

Top views are shown 



Mode 1 



Mode 2 Mode 3 



Mode 4 








Fig. 13. Modes 1-4 of Methods 1 (top) & 4. Showing 
pattern change, i.e. movement tangential to the surface. 
3/4 views are shown 



Key: 




Fig. 14. The final model position after searching an unseen original (unsegmented) 
image. The labels are those from Fig. 0 
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points on the unseen example of Fig. d to calculate a 3D mapping (3D Thin 
Plate Spline, TPS j7]), the labelled hull was warped to the new example. From 
that position, labels were transferred to the pre-segmented cortex on a closest 
point basis. Fig. [TTI shows the results. It can be seen that some labels have ‘bled’ 
into adjoining regions. These are due to innaccuracies in the model search. This 
should be alleviated by adding more examples to the model, improving its ability 
to represent such complex structures. 

10 Summary 

We have shown that adding in local structural and configural information to 
the point matching metric has improved PDMs generated automatically from 
unlabelled feature data. In addition special measures to account for the variable 
fragmentation of structures have also made improvements. These models can 
then be used to search for, and label, specific features in unseen 3D images, and 
can provide a 3D mapping for an atlas to an unlabelled image. The potential 
applications include visualisation, measurement, diagnosis and normalisation. 
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Abstract. An algorithm for improved automatic segmentation of gross 
anatomical structures of the human brain is presented that merges the 
output of a tissue classification process with gross anatomical region 
masks, automatically defined by non-linear registration of a given data 
set with a probabilistic anatomical atlas. Experiments with 20 real MRI 
volumes demonstrate that the method is reliable, robust and accurate. 
Manually and automatically defined labels of specific gyri of the frontal 
lobe are similar, with a Kappa index of 0.657. 



1 Introduction 

Quantitative analysis of neuro-anatomical or neuro- functional data often requires 
explicit regional identification of gross anatomical structures. Unfortunately, 
manual segmentation is time-consuming, subjective and error prone. Further- 
more, inter- and intra-observer variability may reduce detectability of subtle 
differences when making comparisons. Automatic structure identification from 
medical images is a difficult task, due to the anatomical variability between 
subjects, differences in subject positioning (between patients and with respect 
to standard anatomical texts), the distinct physical properties measured by the 
imaging modalities, and variability of acquisition parameters such as slice thick- 
ness and pixel size. 

It is important to note that we differentiate between classification and seg- 
mentation. We define segmentation to be the top-down regional parceling of an 
image into anatomically meaningful continuous groups of voxels; classification is 
defined to be the bottom-up (or data driven) labelling of individual voxels with a 
tissue class label without demanding spatial contiguity for a class of voxels. The 
image data represent only one measure (or a few measures in the case of multi- 
spectral data) concerning the underlying anatomy, and by itself is sufficient only 
for classification. Anatomically distinct regions of the brain are differentiated 
on the basis of histology, cyto-architecture, connectivity, cyto-chemistry or func- 
tion. As such, data from external sources are required to constrain and guide 
the segmentation process. 

A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. 210- 12^ 1999. 

© Springer- Verlag Berlin Heidelberg 1999 
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These external data can be represented in at least two basic forms, and this 
distinction is used here as a basis to identify two main classes of methods that 
have been proposed to solve the segmentation problem for different applications. 
In the first, a symbolic mapping is created between features extracted from the 
image volume (usually small homogeneous regions) and a symbolic model of the 
anatomical structures to be segmented. 

Expert rule-based systems are often used to achieve this mapping where 
anatomical knowledge is stored explicitly along with segmentation heuristics in 
semantic form such as an ‘if-then’ rule. Example of these procedures can be 
found in the work of Raya et. al. fP, Chen et. al. |2I, Dellepiane et. al. |3|, 
Arata and Dhawan msi and Davis et. al. ^ . Other algorithms do not explicitly 
employ if-then rules to drive the segmentation. Instead, anatomical constraints 
are implicitly incorporated into the procedure. Kaneda et. al. [Z| use model- 
guided contour extraction and 3-D reconstruction to identify dilated ventricles 
in CT images. Anatomical constraints have also been used by Brummer [(SID) to 
extract brain contours from MRI. Pathology (i.e. MS lesions) can be identified 
using similar techniques m- 

Registration-based segmentation procedures differ from those previously de- 
scribed since they estimate a spatial transformation function that best maps 
features of one data set onto another pre-labelled volume that serves as an iconic 
model. These procedures are all based on the assumption that there exists a one- 
to-one mapping between the brain to be segmented and the one used as a model. 
In one of the first 2D examples, Broit et. al. E] used elastically-constrained 
non-linear registration between a computed tomography (CT) image and a cor- 
responding atlas slice. This work has been continued by Bajcsy et. al. , extended 
to 3D and reposed in a probabilistic formulation mm- Miller et. al. also 
use a probabilistic formulation with physically based models IT^TTI in order 
to segment individual brains by registering them to a target. We too have devel- 
oped a registration-based segmentation procedure named animal (Automatic 
Nonlinear Image Matching and Anatomical Labeling) to automatically identify 
structures in the brain (described in more detail in Sect. l‘2.4il . It has been shown 
to successfully segment basal ganglia structures HSl but it has not been able to 
segment cortical structures satisfactorily (voxel-based overlap indices with man- 
ual segmentations have been typically around 40-50%). There are two reasons 
for this: i) there exists important variability in the topology of sulcal and gy- 
ral patterns cortex. For example, how should one account for the existence of a 
double Heschl’s Gyrus in a subject when the pre-labeled target has only one? 
This is an example of where the one-to-one relationship that animal depends 
on does not hold at the cortej0- ii) the deformation field estimated by animal 
does not have the power to unfold the cortex of one brain and then refold it 
back onto a target brain. The deformation field is bandlimited and therefore 
does not have high enough frequencies to introduce (or remove) cortical folds 

^ Note that this problem aDectaiot only animal, but ajl_ registration-based segmen- 
tation procedures. Even though fluid-based methods may recover a continuous map- 
ping, point correspondence between model and model is ill-defined. 
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where needed. Still, the animal procedure is able to correctly identify structure 
location, position and smooth structure boundaries. 

The procedure presented here addresses these problems. By merging the com- 
plementary information from ANIMAL’S non-linear deformation (i.e. low resolu- 
tion region identification) with the output of a classification technique (i.e. voxel 
class labels), it is possible to accurately identify specific cortical structures from 
a subject’s MRI. The work presented here is most similar to that of Zachmann 
et. al. m, where an iconic model (represented by a voxelated volume, where 
the value in each voxel represents the probability of existence of a structure) is 
used for identification of the different fluid spaces of the brain. The work here 
is different in that it is fully 3D, uses non-linear registration (instead of linear), 
and is applied to the entire cerebral volume including not only the cerebrospinal 
fluid (CSF) filled spaces, but deep brain structures and cortical gyri and sulci as 
well. 



2 Methods 

2.1 Stereotaxy 

The methodology presented here is highly dependent on the notion of stereotaxic 
space, i.e. a standardized brain-based coordinate system that yields a method of 
identification of structure location and position so that regions of interest can be 
compared between brains using standard coordinates. Like many groups in brain 
mapping research, we have selected to use a coordinate system similar to that 
defined by Talairach 1201 with the origin placed at the anterior commissure, the 
x-axis running from left to right, the y-axis running from posterior to anterior 
and the z-axis running from inferior to superior. 

When image volumes are transformed into this space and resampled on the 
same voxel grid such that all brains have the same orientation and size, voxel-by- 
voxel comparisons across data volumes from different populations are possible, 
since each voxel (i,j,k) corresponds to the same (x,y,z) point in the brain- 
based coordinate system. The transformation to this coordinate system also 
provides a means for enhancement of functional signals by averaging images 
in this space m- This paradigm allows information (anatomical, metabolic, 
electrophysiological, chemical, architectonic) from different brains to be spatially 
organized and catalogued by mapping all brains into the same coordinate system 
m- Finally, in the original Talairach spirit, the coordinate corresponding to a 
particular structure, as defined by an atlas in this coordinate system, can be 
used to predict its location in a subject’s brain volume when mapped into the 
same space. However, normal anatomical morphometric variability limits this 
predictive value since there remains variability in structure position even after 
linear transformation. 

We represent this variability by a statistical probability anatomy maps 
(SPAM) pg. By definition, the SPAM for any given structure is a volumetric 
data set sampled in stereotaxic space, where the value at each voxel position 
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represents the probability of existence of that structure at that location within 
the brain-based coordinate system. At each voxel, the probability is proportional 
to the number of volumes containing the structure label, divided by the total 
number of volumes. For example, SPAMs can be created with voxel-by-voxel 
averaging of label volumes from tissue classified data from many subjects to yield 
spatial priors that can be used in classification procedures. Here the SPAMs are 
created from the segmented structure labels from many subjects (see Sect. 12.51) 
and used as prior anatomical model information to drive the segmentation. 

2.2 MRI Preprocessing 

A number of processing steps are required to achieve segmentation. We have 
combined preprocessing steps (image intensity non-uniformity correction 1241 1. 
linear registration (animal in linear mode [125] ) and resampling into stereotaxic 
space, cortical surface extraction (Multiple surface deformation or msd [2ti|27] ). 
tissue classification (insect |2E|), and non-linear registration (animal in non- 
linear mode HHI) into a processing pipeline. These are represented schematically 
in Fig. n Since the animal and insect procedure are merged to improve seg- 
mentation, the new procedure is termed ANIMAL-|-insect. After running this 
pipeline, a subject’s MRI volume can be visualized in stereotaxic space with its 
corresponding tissue labels, anatomical structure labels and cortical surface — 
all in 3D. The following sections describe the classification (insect) and nonlin- 
ear registration (animal) procedures in more detail. 




Fig. 1. Processing pipeline. All MRI data are processed through the pipeline shown 
above. After preprocessing to correct for intensity non-uniformity, the data are linearly 
registered into stereotaxic space and resampled onto a 1mm isotropic grid. The resulting 
volume is automatically classified into GM, WM, and CSF components and the cortical 
surface is automatically extracted. The non-linear transformation to stereotaxic space is 
used to warp the standard probabilistic atlas onto the classified data, defining structures 
by masking tissue classes. The cortical surface is used to mask non-brain from cerebral 
structures 



2.3 INSECT 

After image intensity non-uniformity correction, stereotaxic registration and re- 
sampling, the classification strategy used by insect relies on a standard feed- 
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forward error-backpropagation artificial neural network (ANN). Since after re- 
sampling an (x,y,z) location in the image lattice corresponds to the same physical 
(brain) location in all MRI modalities, the intensity values of all MRI modalities 
at that location are used as the ANN inputs. As such, the number of ANN input 
nodes is equal to the number of MRI modalities, whereas the number of out- 
put nodes is equal to the number of tissue classes (typically white matter, gray 
matter, CSF, and background). The ANN is fully connected between layers, and 
contains one hidden layer with 10 nodes. Training of the network is accomplished 
using a collection of fixed stereotaxic coordinates, derived from the SPAMs (or 
probability maps, see Sect. 12. II) of WM, GM, and CSF. Based on these SPAMs, 
any spatial location included in the training set belongs to one of the three tissue 
classes with a minimum likelihood of 90%. The MRI intensity values of the sub- 
ject’s MRI acquisition at these locations are used as training input to the ANN, 
with the corresponding tissue class label as the target output. After training, 
the ANN is used to classify each voxel of the subject data set into WM, GM, or 
CSF. 

2.4 ANIMAL 

Identification of individual brain regions, such as the caudate nucleus, planum 
temporale or superior frontal gyrus, faces two major problems. First, while 
anatomists may generally agree where a structure is located, there is often no 
consensus on exactly which part of the structure should be included or excluded. 
Secondly, the manual labelling process is both time-consuming and the position 
identified of chosen boundary is subjective, and dependent on the level and con- 
trast of the image displayed. To address these difficulties we have developed 
ANIMAL, an algorithm to perform this labelling automatically in 3D m- 

The ANIMAL algorithm deforms one MRI volume to match another, previ- 
ously labelled, target MRI volume. It builds up a 3D non-linear deformation field 
in a piecewise linear fashion, recursively fitting local spherical neighbourhoods. 
Each local neighbourhood from one volume is translated to achieve an optimal 
match within the other volume. The local neighbourhoods are arranged on a 
3-D grid to fill the volume and each grid node moves within a range defined by 
the grid spacing. The algorithm is applied iteratively in a multi-scale hierarchy. 
At each step image volumes are convolved with a 3D Gaussian blurring kernel 
where blurring and neighbourhood size (sphere diameter) are reduced after each 
stage. Local neighbourhood fit is measured by correlation of the blurred image 
intensities. Initial fits are obtained rapidly since at lower scales, only gross dis- 
tortions are considered, but later iterations at finer scales accommodate local 
differences at the price of increasing computational cost. Anatomical segmenta- 
tion is achieved by transforming labels from the second (target) volume onto the 
first volume, via the inverse of spatial mapping of the 3D deformation field (see 
Fig.E|-c for an example of an ANIMAL segmentation). 

This method has the important advantage that it is atlas independent, since 
the labels do not take part in the fitting process. In fact, multiple atlases defined 
for different applications or by different anatomists can co-exist on the target 
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volume, and each one can be mapped through the non-linear transformation 
without recomputation of the latter. 

2.5 ANIMAL+INSECT 

In the standard application of ANIMAL, the target is an MRI volume from a 
single subject where all of the voxels within the volume have been anatomi- 
cally labelled by a neuroanatomist to form an atlas m- In the ANIMAL+INSECT 
paradigm described here, the target is an voxel-by-voxel intensity average of 305 
MRI volumes, where each volume was automatically registered and resampled in 
stereotaxic space PDl- The atlas used for segmentation was created by averaging 
anatomical labels from 152 subjects young normal subjects, collected as part of 
the ICBM project [TTr] . 



Probabilistic Atlas There are a number of problems associated with an anatom- 
ical atlas that is based on a single subject. For example, even though the subject 
may be normal, certain brain regions may represent an extreme of the normal 
distribution. Also, the use of a single brain atlas does not contain any notions of 
anatomical variability, so it is impossible to evaluate the normality of shape, size 
or position of specific structures from other subjects by comparing them with 
the atlas. Finally, only one cortical topology (sulcal/gyral pattern) is represented 
even though large variability is known to exist . Since all registration-based 
segmentations strategies (animal included) are based on the assumption that 
there exists a 1-to-l homology for all structures between source and target brains, 
these strategies are undefined and may fail when this correspondence does not 
exist, especially at the cortex. 

Many of the problems listed above are addressed by using a probabilistic 
atlas, or SPAM, created from the labellings of a large ensemble of normal sub- 
jects m- The SPAM atlas used here models the anatomical variability of shape, 
size and topology of 91 gross anatomical structures, where each structure is 
represented by a SPAM volume in stereotaxic space (see Sect. I2.1jl . The AN- 
IMAL+INSECT segmentation paradigm requires that the atlas labels be trans- 
formed from the target space and resampled onto the subject’s MRI volume. 
Resampling a large number of SPAM volumes is inefficient, since only the label 
of the most likely structure at each voxel position need be transferred to the 
subject’s volume for masking. Therefore, a max-probability atlas (MPA) was 
created in the target space, where only the label of the most probable structure 
is stored at each voxel. This volume is created once by traversing the stereotaxic 
volume, voxel-by- voxel, and storing only the label of the SPAM with the highest 
probability at that voxel. 

In practice, labelled data from large number of subjects is needed to create 
the atlas. Ideally, manual segmentations of all atlas structures on all subjects 
should be used. Unfortunately manual identification is very time consuming (e.g., 

1 man-month required to segment the thalamus on 200 subjects m, making 
the ideal situation unrealistic. Here, as proof of principal, the standard ANIMAL 
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m procedure was used with a gross anatomical atlas containing 91 structures 
to segment 150 data sets of young normal adults m- Validations of the 
ANIMAL procedure have demonstrated that on average, automatic segmentations 
are comparable to manual labellings for basal ganglia CHI and cortical gyrJI |55|. 
making this solution only slightly less than ideal. These 150 segmentations were 
used to create 91 SPAM volumes that were in turn used to produce the MPA 
shown in Fig0 In addition, three other MPA models were created from 1) the 
set of 71 grey matter SPAMs to create a gMPA, 2) the set of 16 white matter 
SPAMs to create a wMPA and 3) the set of 4 CSF SPAMs to create a vMPA {v 
for ventricular). 



Merge Method Application of ANIMAL, using the MNI305 intensity average 
target and the corresponding MPA results in a customized maximum probabilistic 
atlas (c-MPA) for the given subject (see Fig.EI). This paradigm is similar to the 
typical use of the Talairach atlas in brain mapping for structure interpretation 
and localization. The major advantage is that the customized atlas indicates the 
most likely structure label for each voxel for a particular subject given anatomical 
variability of a normal population, instead of only a structure label of the single 
target brain. The ANIMAL+insect methodology makes a further improvement 
by incorporating tissue class information derived from the subject in question in 
the following manner. 

After the three c-MPA models corresponding to GM, WM and CSF are 
warped and resampled, they are used as masks to assign labels to regions of the 
corresponding tissue types classified by insect. The c-gMPA is applied to the 
GM tissue class to identify the gyri of the different cerebral lobes, basal ganglia 
structures and the thalamus. The c-wMPA is applied to the WM tissue class to 
label the corpus collosum, the anterior and posterior limbs of the internal capsule 
and the WM voxels belonging to the lobes. In the same fashion, the c-vMPA is 
applied to the CSF tissue class to segment the lateral, third and forth ventricles. 
Note that while the c-MPAs actually overlap and thus may yield several different 
labels for a given voxel, only the c-MPA label corresponding to the voxel’s tissue 
is applied. In the same manner, partial volume effects may be accounted for if 
the classification procedure outputs continuous (instead of discrete) data. For 
example, sulcal CSF can be labelled as such with the c-vMPA, even though the 
classifier outputs CSF voxels with a magnitude less than 1.0. 

Some cortical SPAMs extend past the inner table of the skull and may extend 
into the scalp with a very low (but non-null) probability, since there are no other 
cerebral structure SPAMs that will compete for the maximum probability label. 
When the original MPA is created, these extra voxel labels remain and will 
erroneously apply a cortical label to voxels located in the skull or scalp that 
were classified as GM or WM. In order to remove these incorrect labels, the 



^ It is interesting to note that while individual cortical structure labellings may be in 
error, SPAMs generated by averaging either manual or automatic labellings are very 
similar. 
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Fig. 2. Max-probability atlas. These images show slices through the maximum prob- 
ability atlas (left) and the corresponding slices through the ICBM150 Tl-weighted 
average brain (right) 
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Fig. 3. Schematic of ANIMAL+ insect merge. The non-linear transformation required to 
customized the stereotaxic MPA for the subject is estimated by animal. The subject’s 
MRI is classified in to WM, GM and CSF classes by insect. The classified data are 
masked by the regions in the c-MPA to segment regions on the subjects MRI volume 




Fig. 4. ANIMAL-only vs ANIMAL+INSECT. (Left to right) Coronal slice through 
original MRI volume; typical zoomed result (upper left quadrant) result of insect 

classification; of ANiMAL-only segmentation; of ANIMAL+ insect segmentation; or man- 
ual segmentation. Note how the ANIMAL+ insect result improves segmentation at the 
cortex and the ventricles and agrees with the expert labelling 
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cortical surface extracted by msd is used to create a brain mask that is applied 
against the label volume. 

Some structures cannot be segmented using only the method described above. 
For example, in the Tl-weighted volumes from the ICBM data base, the medial 
half of the thalamus is usually classified as GM, while the lateral half is classified 
as WM and cannot be distinguished from the adjacent white matter of the 
posterior limb of the internal capsule. In this case, it is impossible to apply a 
regional mask to a single tissue class to extract and label the structure. Therefore, 
some structure-specific segmentation rules are required. For the thalamus, the 
medial border is easily defined by masking the GM tissue class with the c-gMPA. 
The definition of the lateral border is completely model-based using the standard 
ANiMAL(-only) segmentation technique and is equal to the lateral border of the 
thalamus in the cMPA. Similar rules are used for the head of the caudate nucleus, 
putamen and globus pallidus. Once these structures are segmented, their labels 
are overlaid on top of the previous segmentation result, overwriting any labels 
already specified by the initial cMPA masking process. 



3 Experiments and Results 

3.1 MRI Acquisition 

The data used for the experiments described below were acquired as part of the 
International Gonsortium for Brain Mapping (IGBM) project, a Human Brain 
Mapping funded research project with the goal of building a probabilistic atlas of 
human neuro-anatomy m- Tl-weighted MRI volumes from 152 young normal 
volunteers (86 male, 66 female, age 24.6 ±4.8) were acquired using a 3-D spoiled 
gradient-echo acquisition with sagittal volume excitation (TR=18, TE=10, flip 
angle=30°, 140-180 sagittal slices). As described below in Sect. 0 frontal lobe 
gyri were manually identified on twenty of these volumes. 



3.2 Comparison of Segmentations 

Figure0shows a comparison of an ANiMAL-only segmentation, an ANIMAL±insect 
segmentation and a manual segmentation. Not only is the ANIMAL±insect seg- 
mentation improved at the cortex, where some grey-matter regions were missed 
with the standard ANIMAL technique, the segmentation of the lateral ventricles is 
much better as well. Where the animal technique overestimated the size of the 
ventricle, the ANIMAL±insect is in complete agreement with the MRI anatomy 
and with the expert’s labelling. Note that there remain some discrepancies be- 
tween the ANIMAL±INSECT and the manual segmentations - especially at the 
boundaries between gyri. 

In order to determine how well the segmentation procedure works in general, 
we used manually segmented labels of gyri of the frontal lobes and compared 
these to automatic labellings. 
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3.3 Manual Labelling 

In each hemisphere, the gray matter of five pre-frontal regions (superior, medial 
and inferior frontal gyrus, the anterior cingulate gyrus and the orbito- frontal 
gyri) were labelled by hand. The voxels for each structure were manually iden- 
tified by voxel painting using Display, a computer program developed in our 
lab that shows four 2D orthogonal slices (transverse, coronal, sagittal and 
user-defined oblique) through the volume with arbitrary pan, zoom and intensity 
mapping on each slice. Display also includes a 3D graphics window that is capa- 
ble of displaying 3D geometric objects such as the cortical surface. The cursor 
can be placed in any of the 2D or 3D windows, and its position is simultaneously 
updated in the other views. Voxel labels are painted on any of three orthogo- 
nal views with simultaneous update in the other two. Cortical landmarks such 
as the precentral, superior and inferior frontal, cingulate, fronto-orbital, fronto- 
marginal and superior rostral sulci are identified in the 3D window and are used 
to guide the manual segmentation. Manual segmentation of the ten gyri listed 
above required approximately 10-15 hours per subject. 



3.4 Automatic Labelling 

Qualitatively, the images in Fig 0 demonstrate that the automatic labellings of 
the left superior frontal gyrus are very similar to the manual segmentations. In 
fact, the grey-white border and grey-CSF borders are very similar. In some cases 
however, the ANIMAL-|-insect method includes the opposite sulcal bank in the 
gyral labels. 

In order to compare the two methods quantitatively, we have used a similarity 
measure first proposed by Dice m As shown by Zijdenbos PHI, this measure is 
a variant of the standard chance-corrected Kappa (k) coefficient first developed 
by Cohen m- This measure is the same as n when the background is infinitely 
large. 

When averaged over the 20 segmentations, the mean and standard deviation 
of the K variant is 0.657 ± 0.037. In order to interpret this value and put it into 
context, the right-most image on the third row of Fig|H|has a value of 0.728 (best 
K value in this experiment), while the third image in the top row has a value of 
0.573 (worst k value). Finally, the labelling of the superior frontal gyrus from 
a single subject was deliberately dilated by one voxel, and the k variant was 
evaluated between on the original and dilated labelling, yielding 0.725. Dilating 
by 2 voxels yields 0.593. 

4 Discussion 

We have presented an improved method for automatic segmentation of brain 
structures by merging the complementary information from animal’s non-linear 
deformation regional identification with the output of insect’s classification 
technique. The procedure presented here is completely automatic and therefore 
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Fig. 5. Segmentation of the Superior Frontal Gyrus. These images compare the manual 
(left) and automatic (right) segmentations of the left superior frontal gyrus on coronal 
slices from 20 subjects 



fully objective and applicable to large ensembles of brain volumes. While the 
new procedure uses two algorithms that were developed at the Montreal Neu- 
rological Institute, the new improved segmentation method is not dependent on 
these particular methods. In fact, any classification method that differentiates 
tissue types and any non-linear registration method may be merged to max- 
imize the complementary information of both techniques. Since insect yields 
high resolution structure information, it is no longer necessary to run ANIMAL 
to fine resolutions, thus providing a considerable improvement in speed. In fact, 
running times are reduced from approximately 10 hours for estimation of the 
high resolution non-linear fit to less than 2 hours, including both classification 
and low-resolution warping. 

The qualitative results shown in FigEldemonstrate that the ANIMAL-HNSECT 
methodology can segment individual gyri from MRI data. While the quantitative 
measures presented here are not as high as we would like, we are currently 
working on estimating intra- and inter-observer variability estimates to put these 
values into context. 

At least three methodological problems remain for future work: 1) In their 
current form, the cortical SPAMs do not explicitly represent multiple topolog- 
ical patterns that exist for cortical gyri. We plan to use an atlas that contains 
multiple SPAM representations for specific cortical regions, where each SPAM 
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corresponds to a given cortical pattern for that region. 2) Structures that have a 
high anatomical variability are represented by SPAMs whose size is smaller than 
their true average size. These structures must be segmented using a model-only 
method, similar to those described above for the segmentation of the thalamus, 
caudate, putamen and globus pallidus. 3) Surface data, extracted by msd, will be 
used to refine over-defined cortical regions (e.g., where the opposite sulcal bank 
is included in the segmentation of a gyrus) . By using the surface information, it 
will be possible to separate small disconnected regions on the cortical surface, 
and then correct the gyral labelling in 3D. 
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Abstract. A fundamental problem with a large class of image registra- 
tion techniques is that the estimated transformation from image A to B 
does not equal the inverse of the estimated transform from B to A. This 
inconsistency is a result of the matching criteria’s inability to uniquely 
describe the correspondences between two images. This paper seeks to 
overcome this limitation by jointly estimating the transformation from 
A to B and from B to A while enforcing the consistency constraint that 
these transforms are inverses of one another. The transformations are 
further restricted to preserve topology by constraining them to obey the 
laws of continuum mechanics. A new parameterization of the transfor- 
mation based on a Fourier series in the context of linear elasticity is 
presented. Results are presented using both Magnetic Resonance and X- 
ray Computed Tomography Imagery. It is shown that joint estimation of 
a consistent set of forward and reverse transformations constrained by 
linear-elasticity gives better registration results than using either con- 
straint alone or none at all. 



1 Introduction 

A reasonable but perhaps not always desirable assumption is that the mapping of 
one anatomical image (source) to another (target) is diffeomorphic, i.e., continu- 
ous, one-to-one, onto, and differentiable. By definition, a diffeomorphic mapping 
has an unique inverse that maps the target image back onto the source image. 
Thus, it is reasonable goal to estimate a transformation from image A to B that 
should equal the inverse of the transformation estimated from B to A assuming 
a diffeomorphic mapping exists between the images. However, this consistency 
between the forward and reverse transformations is not guaranteed with many 
image registration techniques. 

Depending on the application, the diffeomorphic assumption may or may not 
be valid. This assumption is valid for registering images collected from the same 
individual imaged by two different modalities such as MRI and CT, but it is not 
necessarily valid when registering images before and after surgery. Likewise, a 
diffeomorphic mapping assumption may be valid for registering MRI data from 
two different normal individuals if the goal is to match the deep nuclei of the 

A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. 224- 12371 1999. 
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brain, but it may not be valid for the same data sets if the goal is to match the 
sulcal patterns. 

Alternatively, diffeomorphic transformations may be used to identify areas 
where two image volumes differ topologically by analyzing the properties of the 
resulting transformation. For example, consider the problem of matching an 
MRI image with a tumor to one without a tumor. A possibly valid diffeomor- 
phic transformation would be one that registers all of the corresponding brain 
structures by shrinking the tumor to a small point. Such a transformation would 
have an unusually small Jacobian which could be used to detect or identify the 
location of the tumor. Conversely, consider the inverse problem of matching the 
image without the tumor to the one with the tumor. A valid registration in this 
case may be to register all of the corresponding brain structures by allowing the 
transformation to “tear” (i.e., not be diffeomorphic) at the site of the tumor. 
Just as valid could be a diffeomorphic transformation that registers all of the 
corresponding brain structures by allowing the transformation to stretch at the 
site of the tumor. 

As in the previous examples, we will assume that a valid transformation 
is diffeomorphic everywhere except possibly in regions where the source and 
target images differ topologically, e.g., in the neighborhood of the tumor. For 
the remainder of the this paper, we will consider registration problems that the 
diffeomorphic transformation assumption is valid. These ideas can be extended to 
certain non-diffeomorphic mapping problems by including boundary conditions 
to model, isolate or remove regions that differ topologically. 

Transformations that are diffeomorphic maintain topology guaranteeing that 
connected subregions remain connected, neighborhood relationships between 
structures are preserved, and surfaces are mapped to surfaces. Preserving topol- 
ogy is important for synthesizing individualized electronic atlases; the knowledge 
base of the atlas maybe transferred to the target anatomy through the topol- 
ogy preserving transformation providing automatic labeling and segmentation. 
If total volume of a nucleus, ventricle, or cortical sub region are an important 
statistic it can be generated automatically. Topology preserving transformations 
that map the template to the target also can be used to study the physical 
properties of the target anatomy such as mean shape and variation. Likewise, 
preserving topology allows data from multiple individuals to be mapped to a 
standard atlas coordinate space Registration to an atlas removes individ- 
ual anatomical variation and allows information from many experiments to be 
combined and associated with a single conical anatomy. 

The forward transformation h from image T to S and the reverse transfor- 
mation g from S to T are pictured in Fig. E Ideally, the transformations h and g 
should be uniquely determined and should be inverses of one another. Estimating 
h and g independently very rarefy results in a consistent set of transformations 
due to a large number of local minima. As a result, we propose to jointly esti- 
mate h and g while constraining these transforms to be inverses of one another. 
The joint estimation makes intuitive sense in that the invertibility constraint 
will reduce the number of local minima because the problem is being solved 



226 



G. E. Christensen 



from two different directions. Although uniqueness is very difficult to achieve in 
medical image registration, the joint estimation should lead to more consistent 
and biologically meaningful results. 




Fig. 1. The transformation h maps the image volume T to S and the transformation g 
maps S to T. In order for the mappings to be biologically meaningful, h and g should 
be inverses of one another 



The need to impose the invertibility consistency constraint depends on the 
particular application and on the correspondence model used for registration. In 
general, registration techniques that do not uniquely determine the correspon- 
dence between image volumes should benefit from the consistency constraint. 
This is because such techniques often rely on minimizing/maximizing a similar- 
ity measure which has a large number of local minima/maxima due to correspon- 
dence ambiguity. Examples include similarity measures based on features in the 
source and target images such as image intensities, object boundaries/surfaces, 
etc. In theory, similarity measures have more local minima as the dimension of 
the transformation increases. A registration method that determines the corre- 
spondence between images by minimizing an image intensity similarity measure 
is considered in this paper. 

Methods that use specified correspondences for registration will benefit less 
or not at all from the invertibility consistency constraint. For example, landmark 
based registration methods implicitly impose an invertibility constraint because 
the correspondence defined between landmarks is the same for estimating the 
forward and inverse transformations. However, the drawbacks of specifying cor- 
respondences include requiring user interaction to specify landmarks, unique cor- 
respondences can not always be specified, and such methods usually only provide 
coarse registration due to the small number of correspondences specified. 

2 Registration Algorithm 

2.1 Problem Statement 

The image registration problem is usually stated as: Find the transformation h 
that maps the template image volume T into correspondence with the target 
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image volume S. Alternatively, the problem can be stated as: Find the trans- 
formation g that transforms S into correspondence with T. For this paper, the 
previous two statements are combined into a single problem and restated as: 

Problem Statement: Jointly estimate the transformations h and g 
such that h maps T to S and g maps S to T subject to the constraint 
that h = g~^. 

It is assumed that the 3D image volumes T and S are medical imaging 
modalities such as MRI, fMRI, CT, cryosection imagery, etc. collected from 
similar anatomical populations. Each image is defined as a function oi x £ H = 
[0, 1]^ where 17 is called the image coordinate system. The transformations are 
vector-valued functions that map the image coordinate system 17 to itself, i.e., 
/i : 17 I— !■ 17 and g : 17 i— > 17. Diffeomorphic constraints are placed on h and g 
so that they preserve topology. Throughout it is assumed that h{x) = x + u{x)^ 
h~^{x) = x + u{x), g{x) = x + w{x) and g~^{x) = x + w{x) where h{h~^{x)) = x 
and g{g~"^{x)) = x. All of the fields h, g, u, u, w, and w are (3 x 1) vector-valued 
functions of x S 17. 

Registration is defined using a symmetric cost function C{h, g) that describes 
the distance between the transformed template T(h) and target S, and the dis- 
tance between the transformed target S{g) and template T. To ensure the desired 
properties, the transformations h and g are jointly estimated by minimizing the 
cost function C{h, g) while satisfying diffeomorphic constraints and inverse trans- 
formation consistency constraints. The diffeomorphic constraints are enforced by 
constraining the transformations to satisfy laws of continuum mechanics |2j . 



2.2 Symmetric Cost Function 

The main problem with image similarity registration techniques is that mini- 
mizing the similarity function does not uniquely determine the correspondence 
between two image volumes. In addition, similarity cost functions generally have 
many local minima due to the complexity of the images being matched and the 
dimensionality of the transformation. It is these local minima (ambiguities) that 
cause the estimated transformation from image T to S to be different from the 
inverse of the estimated transformation from S to T. In general, this becomes 
more of a problem as the dimensionality of the transformation increases. To 
overcome this problem for 3x3 linear transformations. Woods et al. P] averages 
the forward and inverse linear transformations to reconcile differences between 
pairwise registrations. 

To overcome correspondence ambiguities, we jointly estimate the transforma- 
tions from image T to S and from S to T. This is accomplished by defining a cost 
function to measure the shape differences between the deformed image T(h(x)) 
and image S{x) and the differences between the deformed image S{g{x)) and im- 
age T{x). Ideally, the transformations h and g should be inverses of one another, 
i.e., h{x) = g~^{x). The transformations h and g are estimated by minimizing 
a cost function that is a function of (T{h{x)) — S{x)) and {S{g{x)) — T(x). The 
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cost function used in this work is given by 

Ci{T{h),S) + Ci{S{g),T)= [ \T{h{x))-S{x)\^dx+ [ \S{g{x)) - T{x)\^dx. 

Jo Jo 

( 1 ) 

Alternatively, the mutual information cost function given in m could be used. 
Notice that this joint estimation approach applies to both linear and non-linear 
transformations . 



2.3 Transformation Parameterization 

A 3D Fourier series representation is used to parameterize the forward and in- 
verse transformations. This parameterization is simpler than the parameteriza- 
tions used in our previous work and each basis coefficient can be inter- 

preted as the weight of a harmonic component in a single coordinate direction. 
The displacement fields are constrained to have the form 



N-l N-l N-l N-1 N-1 N-1 

u{x) = X! X! X! and w{x) = X! X! X! 

k—0 j—0 2—0 k—0 j—0 2=0 

( 2 ) 



where gijk and rjijk are (3x1), complex-valued vectors and Uijk = ^]. 

Notice that this parameterization is periodic in x and therefore has cyclic bound- 
ary conditions for x on the boundary of 17 . The coefficients Hijk and rjijk are 
constrained to have complex conjugate symmetry during the estimation proce- 
dure. 

Proposition 1. Each displacement field in m is real and can be written as 

N-l N-l N/2-1 

u{x) = 2Y,Y. E (3) 

fc=0 j—0 i—0 

if the (3 X 1) vector fiijk = aijk + jbijk has complex conjugate symmetry. 

Proof. Notice that (0 can be written as 



N-l N-l N/2-1 

= E E E 

k—0 j=0 i—0 



because the p,ijk are complex conjugate symmetric. Simplifying the summand 
gives the result. □ 
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2.4 Inverse Transformation Consistency Constraint 

Minimizing the cost function in O is not sufficient to guarantee that the trans- 
formations h and g are inverses of each other. The inverse transformation con- 
sistency constraint is enforced by minimizing the squared difference between the 
transformation h and and the inverse transformation of g, and vice versa. To 
state this mathematically we define the following relationships: h{x) = x + u(x), 
h~^{x) = a: -I- u{x)^ g{x) = a: -I- w{x) and g~^{x) = x + w(x). The consistency 
constraint is enforced by minimizing 

C 2 (u,w) + C 2 (w,u) = / ||u(a:) — ai;(a:)|p(ia: - 1 - / \\w{x) — u{x)\\'^dx. (4) 

Jn Ja 

The inverse transformation h~^ is estimated from h by solving the minimiza- 
tion problem h~^{y) = argmin ||y — h{x)\\'^ for each y on a discrete lattice in I?. 

X 

The inverse h~^ exists and is unique if /i is a diffeomorphic transformation, i.e., 
continuous, one-to-one, and onto. 

2.5 Diffeomorphic Constraint 

Minimizing the cost function in (@|) does not ensure that the transformations h 
and g are diffeomorphic transformations except for when C 2 {u, w)+C 2 {w, u) = 0 . 
To enforce the transformations to be diffeomorphic, we use continuum mechan- 
ical models such as linear elasticity IZE] and viscous fluid PH]. For this paper, 
a linear-elastic constraint of the form 

C 3 (rt) -I- C' 3 (w) = / \\Lu{x)\\‘^dx+ f \\Lw{x)\\'^dx (5) 

Jo Jo 

was used to enforce the diffeomorphic property where h{x) = x+u{x) and g{x) = 
x + w{x). The operator L has the form Lu{x) = — o;V^M(a:) — /3V(V -M(a;)) -I -7 for 
linear elasticity, but in general can be any nonsingular linear differential operator 

m 

Following the approach in 0, the operator L can be considered a (3 x 3) 
matrix operator. Discretizing the continuous partial derivatives of L, it can be 
shown that 0 has the form 



N-l N-1 N-1 

Csiu) + Csiw) = f^hkD^Jkd^Jk + vlJkD^JkV^Jk (6) 

k—0 j—0 i—0 

where f is the complex conjugate transpose. Dijk is a real-valued, (3 x 3) matrix 
with elements 



dll — 

d22 = 2a 



— cos 


( 2nj \ 

t ^ ) 


)- 




— cos 




+ 7 


— cos 


( 2tto \ 

K N ) 






— cos 




+ 7 
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ds3 

di2 

di3 

d23 



2a 



1 — cos 



d2i = P 
d3i = P 

d32 = P 



cos 



cos 



cos 



(t)) + (l -cos (^)) +/?( 
- cos (^(* + j)) 
-cos(^(* + fc)) 
(^(i - fc)) - cos + fc)) 



1 — cos 




+ 7 



2.6 Minimization Problem 

By combining O, 0, and 0, the image registration problem becomes 
h{x),g{x) = aigmin ( \T{h{x)) — S{x)\'^ + \S{g{x)) — T{x)\'^dx 

h(x) ^g(x) J Q 

||M(a;) — ■u;(a:)|p + ||w(j;) — u{x)\\^dx (7) 

+p I \\Lu{x)\\‘^ + \\Lw{x)\\‘^dx 

Ja 

where the constants A and p are Lagrange multipliers used to enforce/balance 
the constraints. 




2.7 Estimation Procedure 

The transformations h and g that satisfy o were estimated using a gradient 
descent algorithm to determine the basis coefficients Pijk\- The estimation 

was accomplished by solving a sequence of optimization problems from coarse to 
fine scale via increasing the number of the basis coefficient vectors {pijk-,gijk} 
during the estimation. This is analogous to multi-grid methods but here the 
notion of refinement from coarse to fine is accomplished by increasing the number 
of basis components. As the number of basis functions is increased, smaller and 
smaller variabilities between the template and target images are accommodated. 

3 Results 

Two MRI and two CT image volumes were used to evaluate the registration al- 
gorithm. The data sets were collected from different individuals using the same 
MR and CT machines and the same scan parameters. The MRI data sets corre- 
spond to two normal adults and the CT data sets correspond to two 3-month-old 
infants, one normal and one abnormal (bilateral coronal synostosis). The MRI 
and CT data sets were chosen to test registration algorithm when matching 
anatomies with similar and dissimilar shapes, respectively. 

The MRI data were preprocessed by normalizing the image intensities, cor- 
recting for translation and rotation, and segmenting the brain from the head 
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using Analyze^^. The translation aligned the anterior commissure points, and 
the rotation aligned the corresponding axial and sagittal planes containing the 
anterior and posterior commissure points, respectively. The data sets were then 
down-sampled and zero padded to form a 64 x 64 x 64 voxel lattice. The CT 
data sets were corrected for translation and rotation and down-sampled to form 
a 64 X 64 X 50 voxel lattice. The translation aligned the basion skull landmarks, 
and the rotation aligned the corresponding Frankfort Horizontal and midsagittal 
planes, respectively. 

The data sets were registered initially with zero and first order harmonics. 
After every 40th iteration, the maximum harmonic was increased by one. The 
MRI-to-MRI registration was terminated after 300 iterations and the CT-to-CT 
registration was terminated after 200 iterations. Tables and 0 show the 
results of four MRI experiments and four CT experiments. In order to isolate 
the contribution of each term of ( 0 , one experiment was done with no priors, 
one with the linear-elastic model, one with the inverse consistency constraint, 
and one with both priors. The four MRI experiments used the parameters I. 
A = /o = 0, 2. A = 0 and p = 50, 3. A = 0.07 and p = 0, and 4. A = 0.07 and 
p = 50; and four CT experiments used the parameters: 1. A = p = 0, 2. A = 0 
and p = 25, 3. A = 0.02 and p = 0, and 4. A = 0.02 and p = 25. The labels MRIl 
and CTl are used to refer to results from the Case 1 experiments, and likewise 
for 2 to 4. 



Table 1. Cost Terms Associated with Transforming Image Volume T to S 



Experiment 


Ci{T 

orig. 


ih),S) 

final 


\C 2 (u, w) 
final 


pCaiu) 

final 


Total 


MRIl 


1980 


438 


0 


0 


438 


MRI2 


1980 


606 


0 


85.7 


692 


MRI3 


1980 


482 


33.4 


0 


516 


MRI4 


1980 


639 


13.0 


74.6 


727 


CTl 


454 


27.0 


0 


0 


27.0 


CT2 


454 


38.8 


0 


28.1 


66.9 


CT3 


454 


28.5 


3.15 


0 


31.6 


CT4 


454 


40.8 


3.34 


28.3 


72.4 



Case 1. corresponds to unconstrained estimation in which h and g are esti- 
mated independently. The numbers in the tables are consistent with this obser- 
vation. First, C 2 {u,w) and C 2 {w,u) show the largest error between the forward 
and inverse mapping for each group of experiments. Secondly, the Jacobian for 
these cases are the lowest in their respective groups. This is expected because the 
unconstrained experiments find the best match between the images without any 
constraint preventing the Jacobian from going negative (singular). This is fur- 
ther supported by the fact that the final values of Ci{T{h), S) and Ci{S{g),T) 
are the lowest in there groups. 
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Table 2. Cost Terms Associated with Transforming Image Volume S to T 



Experiment 


Cl (S' 
orig. 


final 


AC2(w, u) 
final 


pCsiw) 

final 


Total 


MRIl 


1980 


512 


0 


0 


512 


MRI2 


1980 


660 


0 


78.3 


738 


MRI3 


1980 


539 


33.6 


0 


573 


MRI4 


1980 


676 


13.0 


73.7 


727 


CTl 


454 


30.6 


0 


0 


30.6 


CT2 


454 


47.7 


0 


32.4 


80.1 


CT3 


454 


34.6 


3.43 


0 


38.0 


CT4 


454 


50.8 


3.78 


31.9 


86.5 



Case 2. corresponds to independently estimating h and g while requiring 
each transformation to satisfy the diffeomorphic constraint enforced by linear 
elasticity. Just as in Case 1, the large difference between the forward and reverse 
displacement fields as reported by C2{u,w) and C2{w,u) confirms that linear 
elasticity alone is not sufficient to guarantee that h and g are inverses of one 
another. We do however, see that the linear elasticity constraint did improve the 
transformation over the unconstrained case because the minimum Jacobian and 
the inverse of the maximum Jacobian is far from being singular. 

Case 3. corresponds to the estimation problem that is constrained only by the 
inverse transformation consistency constraint. The C2{u,w) and C2{w,u) values 
for these experiments are much lower than those in Cases 1. and 2. because 
they are being minimized. The transformations h and g are inverses of each 
other when C2{u,w) + C2{w,u) = 0 so that the smaller the costs C2{u,w) and 
C2(w,u) are, the closer h and g are to being inverses of each other. 



Table 3. Transformation Measurements 



Experiment 


Jacol 

min 


)ian(h) 

1/max 


Jacol 

min 


)ian(g) 

1/max 


C2{u,w) 


C2{w,u) 


MRIl 


0.257 


0.275 


0.100 


0.261 


28,300 


29,500 


MRI2 


0.521 


0.459 


0.371 


0.653 


10,505 


10,460 


MRI3 


0.315 


0.290 


0.226 


0.464 


478 


479 


MRI4 


0.607 


0.490 


0.410 


0.640 


186 


186 


CTl 


0.340 


0.325 


0.200 


0.49 


73,100 


76,400 


CT2 


0.552 


0.490 


0.421 


0.678 


28,700 


28,300 


CT3 


0.581 


0.361 


0.356 


0.612 


158 


171 


CT4 


0.720 


0.501 


0.488 


0.725 


167 


189 



Case 4. is the joint estimation of h and g with both the inverse consistency 
constraint and the linear-elastic constraint. We can see that this produced the 
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best results because the differences between the inverse transformations were so 
small, i.e., C 2 (u, w) and C 2 {w, ii). Also, the minimum Jacobian of h is nearly the 
inverse of the maximum Jacobian of g, and vice versa. In addition, the minimum 
and one over the maximum Jacobian of h and g have their largest values for 
this experiment (excluding one entry from MRI2). The MRI4 experiment shows 
a better than twofold improvement over MRI3 with respect to the difference in 
the inverse transformations, while the the inverse transformations difference for 
the CT4 and CT3 experiments are nearly equal. This may suggest that the in- 
verse consistency constraint may be used without the linear-elasticity constraint. 
However, the minimum and one over the maximum Jacobian values are larger 
for CT4 than CT3 and similarly for MRI4 and MRI3 suggesting less distortion. 
The closer the minimum Jacobian is to one, the smaller the distortion of the 
images. 

FigureOshows three slices from the 3D result of Case 4 for both the MRI and 
CT experiments. The first two columns show the template T and target S images 
before transformation. The third and forth columns show the transformed tem- 
plate T{h) and target S{g). Columns 5,6, and 7 show the x-,y-, and z-components 
of the displacement field u used to deform the template and columns 8,9, and 10 
show the same for the displacement field w. The near invertibility in gray-scale 
between the displacement fields u and w gives a visual impression that h and g 
are nearly inverses of each other. 

The time series statistics for MRI4 and CT4 experiments are shown in Figs. 0 
and0 These graphs show that the gradient descent algorithm converged for each 
set of transformation harmonics. In both cases, the cost functions Ci{T{h), S) 
and Ci{S{g),T) decreased at each iteration while the prior terms increased be- 
fore decreasing. Notice that the inverse consistency constraint increased as the 
images deformed for each particular harmonic resolution. Then when the number 
of harmonics were increased, the inverse constraint decreased before increasing 
again. This is due to the fact that a low-dimensional Fourier series does not have 
enough degrees of freedom to faithfully represent the inverse of a low-dimensional 
Fourier series. This is easily seen by looking at the high dimensionality of a Tay- 
lor series representation of the inverse transformation. Finally, notice that the 
inverse consistency constraint caused the extremal Jacobian values of the for- 
ward and reverse transformations to track together. This is easiest to see in the 
CT4 experiment. Note that these extremal Jacobian values correspond to the 
worst case distortions produced by the transformations. 



4 Discussion 

The experiments presented in this paper were designed to test the validity of the 
new inverse transformation consistency constraint as applied to a linear-elastic 
transformation algorithm. As such, there was no effort made to optimize the 
rate of convergence of the algorithm. The convergence rate of the algorithm can 
be greatly improved by using a more efficient optimization technique than gra- 
dient descent such as conjugate gradient at each parameterization resolution. In 
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T S T(h) S(g) ul u2 u3 wl w2 w3 




T S T(h) S(g) ul u2 u3 wl w2 w3 




Fig. 2. Images associated with the MRI4 and CT4 experiments 



addition, a convergence criteria can be used to determine when to increment 
the number of parameters in the model. The CT data used in the experiments 
was selected to stress the registration algorithm. The convergence of the algo- 
rithm would have been much faster if the data sets were adjusted for global scale 
initially. 

It is important to track both the minimum and maximum values of the Ja- 
cobian during the estimation procedure. The Jacobian measures the differential 
volume change of a point being mapped through the transformation. At the start 
of the estimation, the transformation is the identity mapping and therefore has 
a Jacobian of one. If the minimum Jacobian goes negative, the transformation is 
no longer a one-to-one mapping and as a result folds the domain inside out HH. 
Conversely, the reciprocal of the maximum value of the Jacobian corresponds 
to the minimum value of the Jacobian of the inverse mapping. Thus, as the 
maximum value of the Jacobian goes to infinity, the minimum value of the Ja- 
cobian of the inverse mapping goes to zero. In the present approach, the inverse 
transformation consistency constraint was used to penalize transformations that 
deviated from their inverse transformation. A limitation of this approach is that 
cost function in m is an average metric and can not enforce the pointwise con- 
straints that min{ J(h)} = 1/ max{ J(g)} and min{ J(g)} = 1/ max{ J(/i)}. This 

X X X X 
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Ci (T{h),S) and C-j (S(g),T) C3(u) and C3(w) 





C2(u,w) and C2(w,u) 




Iterations 



Minimum and Maximum Jacobians 




Iterations 



Fig. 3. Statistics associated with the MRI4 experiment 



point is illustrated by Table 0I by the fact that the minimum values of J{h) 
and J{g) differ from the reciprocal of the maximum values of J{g) and J{h), 
respectively, However, these extremal Jacobian values do give an upper-bound 
on the worst case distortions produced by the transformations demonstrating 
the consistency between the forward and reverse transformations. 

5 Summary and Conclusions 

This paper presented a new algorithm for jointly estimating a consistent set of 
transformations that map one image to another and vice versa. A new param- 
eterization based on the Fourier series was presented and was used to simplify 
the discretized linear-elasticity constraint. The Fourier series parameterization 
is simpler than our previous parameterizations and each basis coefficient can be 
interpreted as the weight of a harmonic component in a single coordinate direc- 
tion. The algorithm was tested on both MRI and CT data. It was found that 
the unconstrained estimation leads to singular or near transformations. It was 
also shown that the linear-elastic constraint alone is not sufficient to guarantee 
that the forward and reverse transformations are inverses of one another. Results 
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Ci{T(h),S)andCi(S(g),T) 




C2(u,w) and C2(w,u) 




Iterations 



Fig. 4. Statistics associated 



C3(u) and C3(w) 




minJ(h), minJ(g), 1/maxJ(h), 1/maxJ(g) 




Iterations 



with the CT4 experiment 



were presented that suggest that even thought the inverse consistency constraint 
is not guaranteed to generate nonsingular transformations, in practice it may be 
possible to use the inverse consistency as the only constraint. Finally, it was 
shown that the most consistent transformations were generated using both the 
inverse consistency and the linear-elastic constraints. 
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Abstract. In this paper we classify inhomogeneous non-linear registra- 
tion algorithms into those of variable data influence, of variable deforma- 
bility and of variable model type. As examples we introduce three mod- 
ifications of the viscous fluid registration algorithm: passing a filter over 
the computed force field, adding boundary conditions onto the velocity 
held, and re-writing the viscous huid PDE to accommodate a spatially- 
varying viscosity held. We demonstrate their application on artihcial test 
data, on pre-/post-operative MR head slices and on MR neck volumes. 



1 Introduction 

Image registration requires finding an optimal transformation between an image 
pair, the source S{x) and target T{x). Single-level registration algorithms are 
divided between those which apply linear transformations and those which allow 
higher order deformations. Generally higher order deformations are performed 
after an initial registration by a linear method, so linear and non-linear are com- 
bined sequentially. In ^ we examined the application of hierarchies of data, warp 
and model, where complexity increases temporally with the progress of registra- 
tion. It is rare that an algorithm allows simultaneous or parallel application of 
both linear and non-linear models within one image, so that only selected areas 
of the image deform. Many medical images contain regions representing both soft 
and hard tissue, and whereas the former often require high order deformations 
to achieve a good registration, in an intra-subject study the hard tissue regions 
should remain rigid. Registration of such image pairs requires algorithms where 
the model varies spatially within the image domain, using prior information on 
the variation of tissue types within the deforming image. These are instances 
of inhomogeneous non-linear registration algorithms. This paper classifies types 
of inhomogeneity and reviews those available in the literature. We then present 
three modifications to the fluid algorithm which introduce inhomogeneities into 
its application. Section 0 describes inhomogeneities in applying the force field 
and in computing the velocity field, and presents the varying- viscosity fluid regis- 
tration algorithm. Finally, Sect.0shows results of application of these algorithms 
to 2- and 3-dimensional data. 

A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. 238- l2All 1999. 

© Springer- Verlag Berlin Heidelberg 1999 
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2 Spatial Inhomogoneneities in Registration Algorithms 

We P classify temporal hierarchies of registration into those of data, those of 
descriptors of deformation or warp, and those of complexity of model. In a similar 
vein, we classify spatial inhomogeneities as those in the data influence, in the 
strength of deformation constraints, and in the application of model type. 



2.1 Variable Data Influence 

The first type of inhomogeneity varies the importance attached to information 
content in the domain of the image pair when computing the transformation re- 
quired at each point in the source. In terms of a Bayesian approach 0, where the 
deformation is determined by the solution of a weighted sum of likelihoods and 
priors, the weight assigned to the likelihood is varied according to assumptions 
about the relevance of the data in different regions of the image. In terms of 
regularisation, where the equation solved is a weighted sum of driving forces and 
constraints on the deformation, the influence of the driving force is weakened or 
strengthened relative to the deformation constraints. 

By ignoring the contribution of the driving force, we can force a region to be 
passive, whose deformation is due solely to its proximity to active regions. 

Let Q = {x} be the domain of the image. We make the following definitions: 

Definition 1 (Active and passive regions). Let 0 C be a region whose 
deformation is given by u(x) satisfying a regularisation equation 

g{u{x)) + Tf{S{u{x)),T{x)) = 0 (1) 

where f is the likelihood and g is the prior constraint dependent on the defor- 
mation. We define 0 to be passive if for all pixels x G 0, the regularisation 
parameter r weighting the likelihood is equal to zero, and active otherwise. 

A medical application would be an intra-modality pair of which the source 
contains known segmented structures whose homologues are absent in the target 
but which may be confused, due to similarities of intensity, with regions nearby. 



2.2 Variable Deformability 

The second inhomogeneity paradigm varies the strength of deformation con- 
straint. Regions of the image are then classified as strongly or weakly deformable. 
In the case of registration modelled on the behaviour of a physical material, the 
deformability is described by one or more parameters of the material proper- 
ties - the elasticity of an elastic medium (Sect. 13.311 or the viscosity of a fluid 
(Sect. ^31). Allowing these parameters to vary spatially requires the derivation 
of modified Partial Differential Equations (PDEs) to account for the parameter 
gradients. 
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Definition 2 (Strongly and weakly deformable). Let O d fl be a region 
whose deformation is given by u[x) satisfying a regularisation equation 

g{u{x); + Tf {S{u{x)),T{x)) = 0 (2) 

where f is the likelihood and g is a prior eonstraint dependent on the deformation 
and on an independent parameter fi S [0, 1] varying spatially within the image, 
such that g{g) ^ 0 as /i ^ 0. We define 0 to be strongly or weakly deformable 
according to the range of g from 0 (strongest) to 1 (weakest). 

The strength of the deformation parameter is supplied at every position in 
the source image, using prior information obtained from one of two sources: 
physical or statistical. In the first case, prior information is available on the de- 
formability of the physical tissues which the images represent. For example, jSj 
demonstrate the estimation of tissue elasticity using certain scanning protocols, 
and we assume the rigidity of hard tissue. Basing the variability of the deforma- 
tion constraint on such physical information is valid only in intra-subject studies, 
where the registration of the source to the target attempts to reproduce actual 
physical movements of tissue. A second type of prior information is applicable 
to and derived from cross-population studies, where the variation in the defor- 
mation constraint is a function of the statistical cross-population variability in 
the shape of each structure in the images. Structures which have been found 
to display little variance in size and shape across a population of normals will 
be labelled with a high value of g, while other areas exhibiting greater vari- 
ability will be labelled with a low value of g and allowed greater deformations 
in registering to their homologues in the target image. For instance, ^ allows 
high variability in ventricular and cortical fold regions, and low variability in 
subcortical structures, in a variable-elasticity algorithm fSect. rOll . 

2.3 Variable Model Type 

Finally it is possible to vary spatially the models or equations causing defor- 
mation. This type of inhomogeneity can achieve completely affine transforma- 
tions within selected regions while deforming intervening or surrounding areas. 
Boundary conditions are set between model type regions such that a continuity 
of mapping is ensured across the image. Examples of such algorithms are the 
Combination MultiQuadric (C-MTQ), Sect. 1.4. 2l the three component Finite El- 
ement (3C-FEM), Sect. 13. ll and a version of the modified fluid 2 (MF2), Sect. 
14.21 In the case of 3C-FEM and MF2, the updating of nodal displacements or of 
pixel velocities is prohibited within selected areas. This is an easy and effective 
method of ensuring that these regions remain rigid; additionally they remain 
motionless. 

We define here the concept of a rigid body within the deforming source, 
together with two paradigms of a rigid region. 

Definition 3. A region O <Z is said to be rigid within a non-linear deforma- 
tion of source S{f2) to target T(f2) if the transformation u(0) is linear. 
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Definition 4. A region 0 C is said to be motionless i/ V a; G 0, the trans- 
formation u{x) = 0. Where the registration S{u(f2,t)) — > T(l7) is a funetion of 
time, 0 is motionless if ^u{0,t) = 0 W x G 0,\/ 1 and ifu{0,O) = 0. 

It may be desirable to have rigid but independently-moving regions: 

Definition 5. A region 0 G is said to be independently-moving ifW x G 0, 
the transformation u{x,t) = c{x,t) is a non-zero linear funetion of x where 
c(xi) = c(xj) \/xi,Xj G 0 at any time t and where 3x G f2,x ^ 0 sueh that 
u{x,t) ^ c{x,t) satisfies the regularisation of a likelihood and prior. 

The main application of such algorithms will be in the modelling of movement 
of hard and soft tissue during surgery. 



3 Review of Inhomogeneous Registration Methods 



3.1 Three-component Finite Element Model (3C-FEM) 

0 gives a finite-element model based on three tissue types, labelled rigid, de- 
formable and ‘fluid’. The deformations are driven by user-supplied landmark 
displacements and deformations of the deformable regions are constrained by 
three energy terms: 
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where Nfj is the original distance between the nodes Ni and Nj and the nodes 
Ni, Nj and Nk are collinear before deformation; nodes Nk, Ni and Nm form 
a triangle with initial area Aq and deformed area A and 7 is a threshold of 
triangular area reduction. The ‘fluid’ deformations are constrained only by ^’fold 
which prevents folding of the image. Rigid regions are obtained by prohibiting 
the updating of their nodal displacements. 



3.2 Combination Multiquadric Spline (C-MTQ) 

Little et. al. jOj have constructed a variant of the landmark spline, incorporating 
regions which undergo independent linear transformations only. The method is 
applied to pre-segmented images, with regions classified as hard or soft tissue. 
The hard tissue regions form a set O of n rigid bodies {Oi} such that O = 
Ur=i where one body Oi, can consist of separate parts (all undergoing the 
same linear transformation) but no two bodies may overlap. The method uses a 
distance transform to weight differently the linear and non-linear components of 
the overall image mapping, such that the non-linear terms are smoothly reduced 
to zero as the rigid bodies are approached, and each rigid body is constrained to 
its own linear mapping while contributing to the underlying linear drift of the 
non-rigid areas. 
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3.3 Elastic Registration with Variable Elasticity 

Davatzikos 0] presents an elastic registration model applied to images of the head 
where the elasticity parameters vary spatially within the image. The deformation 
is driven by distances between parametrically-defined pre-segmented cortical and 
ventricular surfaces in the source and target, and also incorporates a pre-strained 
elasticity term. The latter allows for voluntary growth in specified image areas, 
for example to model the growth of a tumour. 

First the brain tissue is segmented from the images and a deformable surface 
is applied to the source and target brain volumes, giving for each a parametric 
description of the shape of the outer cortical surface. At each point on the 
ventricular surface in the deforming source, a force is computed from the distance 
to the nearest point on the boundary of the target ventricular surface, weighted 
by the scalar product of outward normals at these points. These ventricular 
forces together with cortical forces derived from matching cortical surfaces by 
curvature measures provide a total external driving force field f which is supplied 
to the variable-elasticity equation: 

{/ + AV^m + {X + ^)V(V • M)}-k 
{(Vm -I- {Vuf - 2/)VA -f (V • M - 3)V^}-1- 

{e(2VA -f 3V/r) -k (2A -k 3/r)Ve} = 0. (3) 

The first bracketed term is the regularisation between driving forces f and the 
elasticity constraints on the displacements vector u(x). The second contains gra- 
dients in the elasticity parameters A and /i, allowing variation in the elasticity 
field, ventricular and cortical surface regions are set lower elasticity values, al- 
lowing for greater ease of deformation. The third term_ contains _gradients in a 
parameter e determining an additional strain tensor Eq = e{x)I which forces 
extra expansion or contraction in pre-selected regions. Hence the algorithm also 
contains inhomogeneities in activity, or data influence. 



4 Modifications to the Viscous Fluid Registration Model 

The fluid PDE is summarised by 



V • d + / = 0 (4) 

where f is the driving force and a is the stress tensor, given by 

a = —pi fj, (Vv (5) 

where v is the velocity field, p is a pressure term and p is the viscosity parameter. 

We now describe methods of introducing inhomogeneities into the application 
of the fluid algorithm such that deformation is reduced or prohibited in areas 
specified as passive, motionless or weakly deformable. They are intended for use 
with prior estimates of the rigidity or cross-population variability of different 
tissue structures identified in a rough initial segmentation. 
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4.1 Modified Fluid 1 (MFl) 

MFl utilises prior knowledge of regions whose intensity information we do not 
wish to contribute to driving the registration. These regions will be passive in 
the registration. A binary array is provided whose pixels, corresponding to those 
in the source, are flagged as passive or active. It is subjected to the same defor- 
mations as the source image and is supplied to a Euclidean Distance Transform 
(EDT) which specifies the distance x from the passive regions. At each timestep 
in the fluid algorithm, forces at each pixel in the source are computed as in 
0 from source-target intensity differences and from intensity gradients in the 
source. Prior to solution of the PDE, the forces are multiplied by the weighting 
function ( 0 , which smoothly reduces them to zero in the neighbourhood of the 
passive regions. 

{ 0 : X < a 

^ — cos : a<x<b (6) 

1 : X > b. 

with a = 2, We used a = 1 and 6 = 13 and found the 4SED algorithm 0 to be 
an adequate approximation to the Euclidean distance. 

4.2 Modified Fluid 2 (MF2) 

This method allows for specified regions to remain rigid by prohibiting their 
pixel movements. A binary array, labelling pixels as either motionless or mobile, 
is passed as extra boundary conditions to the SOR function solving the fluid 
PDE in each timestep. Only velocities at mobile pixel locations are updated; 
those labelled motionless remain at zero velocity. 

4.3 Modified Fluid 3 (MF3) 

The third modification varies the viscosity parameter p, spatially over the image. 
We expand (0 , ignoring p, to give the PDE for the variable viscosity fluid model: 

(V/r • V)u -b -b AiV^u + pV {V ■ v) + f = Q. (7) 

OXj 

Since the partial differential operator now varies over the image, a fast so- 
lution by a basis function expansion or by convolution with filters derived from 
its Green’s functions uni is no longer possible; instead we use the successive 
over-relaxation (SOR) iterative method [TT] . 

5 Results 

5.1 Synthetic Labelled Images 

We created a set of 4 artificial images of size 256 x 256, labelled house, clown, 
house2 and clown2, illustrated in Fig.0 All 4 contain 5 corresponding structures: 
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roof (hat) of intensity 87; shadow (hair) of intensity 39; wall (face) of intensity 
127; windows (eyes and mouth) of intensity 215; and background of intensity 



255. 






Fig. 1. Artificial images: (left-right) clown, house, clown2, house2 



We applied unmodified (UF) and modified (MFl-3) fluid to register the 
house to the clown images, with the house door defined as normal, weakly- 
deformable, passive or motionless to compare the ability of the algorithms to 
reduce or prohibit deformation. Figures 0 and 0 show the results. 



A 'A 



Fig. 2. Progress of the UF registration of house to clown image 



To measure the deformation of the door, we applied to a grid image the 
same deformation as that of the source, and noted the locations of gridpoints in 
the door region before and after registration. For each registration, we inserted 
these into a thin-plate spline and computed its bending energy as given by jjj. 
The results were: UF (target: clown) 1.018; MF3 (target: clown) 0.065; MF3 
(target: clown2) 0.058; MFl (target: clown2) 0.045; MF2 (target: clown2) 0. 
Fig. 0 (bottom) shows the final deformations of the door regions in the grids. 

5.2 Pre/post-operative Head Images, Rigid Scalp 

The next exercise attempted to reduce deformation at an area where source- 
target differences are known a priori in order to highlight other areas where there 
are unknown differences due to abnormalities. The target and source were coronal 
slices of a pre-/post-operative data set exhibiting hydracephalous and coning. 
Fig. 0 Since these are slices through the same subject at approximately, but not 
exactly, the same location, they exhibit slight differences in scalp shape. To some 
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Fig. 3. Registration of S = house2 to T = clown/clown2, restricting deformation of the 
door. {Columns, left-right)-. UF ( T = clown); MF3 (T = clown); MF3 (T = clown2); 
MFl (T = clown2); MF2 (T = clown2). {Rows, top-bottom)-. Deformed S; T - deformed 
S; original S - deformed S; door regions of corresponding deformed grids 



extent the same applies to the cortex and internal brain structures; however the 
main cause of their source-target differences was the surgical procedure and its 
after-effects. It is these differences which were to be highlighted as abnormal. 

We compared registration by the UF, by MF3 and by MF2. 

The target (pre-operative) image was used as an atlas, defining the normal 
brain shape for that subject. The scalp region was segmented manually with the 
aid of the display tool xdispunc developed by Dave Plummer of UCH Medical 
Physics. This prior information of known ‘abnormal’ scalp shape was supplied 
as a binary image indicating the region where deformation was to be reduced. 
Five sets of images were generated for each registration: 

1. the deformed source image, S{u) after registration to the target T. 

2. the same deformation applied to a regular grid image, G{u). The ideal in- 
homogeneous registration paradigm would exhibit no deformation in G{u) 
in the scalp region (painted white on the grid prior to deformation). 

3. local magnitudes of the resulting displacements field and local Laplacian, 
bending and elasticity energy metrics m, to highlight regions of severe 
distortion of the source image from the normal brain shape. 
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Fig. 4. Data set of (left) head4A and (right) head4B 



4. the difference of the deformed source from the undeformed source, S — S(u). 
Assuming registration to the target is optimal in regions where no prior infor- 
mation was supplied, this gives a further indication of abnormality, defined 
as shape difference from the target. (Known) differences in the scalp region 
are not highlighted if its registration to the target has been successfully 
suppressed. 

5. the difference of the deformed source from the target, T — S(u). This is to 
check that registration is complete in the unknown regions and suppressed 
in the scalp. In this case, for an ideal registration, the difference image is 
zero in unknown regions and non-zero in the scalp region. 

The results are shown in Fig.|^ Of the three registrations, in S' — S(u), MF2 
with motionless scalp is the clearest at highlighting only the differences in the 
ventricular and left-cortical areas, (Fig.0, bottom row, far right). By inspecting 
the grid lines in the white regions of G(u) (Fig. E) bottom row centre left) we 
see it has respected the rigidity of the scalp. Finally, to check completeness of 
registration. Fig. El (bottom row centre right) shows good registration in the 
brain region (the difference image shows little structure) and poor registration 
at the scalp. In comparison, MF3 allowed the scalp to distort, shown in the grid 
image (Fig. El centre row, centre left), leaving less structure in the scalp region 
of T - S(u) (Fig. El centre row, centre right); hence the known scalp shape 
difference shows up as an abnormality in S' — S(u) (Fig. El centre row, far right). 

5.3 3D Results - Neck Images 

3-dimensional versions of UF and MF3 were compared on MRI neck volumes 
with the vertebrae defined as weakly deformable. The original images were ac- 
quired at the Hammersmith hospitajj. Two full-3D neck volumes were provided, 
neckDand neckl. NeckI was of the chin down and neckD was with the head fiexed 
backwards within the confines of the scanner bore. Both were of the same sub- 
ject. Since the imaging field of view had been the spinal column specifically, the 

^ The scanner was a Picker LOT HPQ. The acquisitions were RF spoiled volume scans 
with TR = 42, TE = 7, 192 x 255 matrix, 1 Nex 30 cm FoV, 38 x 2 mm slices. The final 
images were Fourier interpolated to a 256 x 256 matrix. A c-spin quadrature surface 
coil was used for reception. All 3 volumes were 16-bit, of dimension 256 x 256 x 38, 
with the neck vertebrae as the field of view; pixel dimensions were 1. 17188 x 1.17188 x 
2 . 
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Fig. 5. Results of registering source S = head4B to target T = head4A. {Rows, top- 
bottom): UF; MF3; MF2. {Columns, left-right): deformed S, S(w); deformation applied 
to grid, G(u); T - S(u), showing completeness of registration; S - S{u), highlighting 
differences due to the deformation {right 2 columns are contrast- enhanced) 




Fig. 6. Deformation metrics {left-right): Laplacian, bending, elastic, magnitude of 
transformation, of the registration by {top) UF and {bottom) MF3 
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volumes did not extend to include the whole neck laterally. The images exhib- 
ited a strong intensity ramp, with high values at the back of the neck and total 
signal loss in the face. We pre-processed the images to give a more uniform range 
of intensities in the anterior-posterior direction, using the scheme described by 
Fig.m Due to memory capacity and time constraints, it was not feasible to 
apply the fluid registrations directly to the full-resolution data sets; hence they 
were downsampled, by blurring with a Gaussian of standard deviation cr = 2 and 
storing alternate pixels (Fig.[3 right). Since the Gaussian blurring for downsam- 
pling, and intensity gradient calculations during registration, were performed in 
the Fourier domain and required image dimensions of powers of two, we used 
zero padding to give full resolution dimensions 256 x 256 x 64 and downsampled 
volumes of 128 x 128 x 32. 




Fig. 7. Pre-processing stages shown on neckD. {left-right)-, slice 19 of the original 3D 
256 x 256 x 38 volume; after division pixelwise by the same image blurred with a 
Gaussian of spatial standard deviation a = 5; masked with the aid of the automatic 
contouring and manual alteration tool in xdispunc; downsampled by a half 



We segmented the spinal vertebrae slice-by-slice from the full-resolution source 
volumes using the xdispunc display tools. The contrast between vertebrae and 
intervening tissue was variable and so segmentation was performed manually 
with reference to an atlas H3|. The segmentations were converted to binary 
spine volumes which were then downsampled using the same process as for the 
necks. 

Both fluid registration tests (UF, MF3) were applied in a six-level scale space 
(Gaussian blurs of spatial standard deviation 2i with i = {5, 4, 3, 2, 1, 0}). Within 
each scale level, the fluid was set to iterate through at least three timesteps, 
with an optional extra 100 timesteps until the stopping criterion was met, the 
stopping criterion being a reduction in correlation coefflcient of less than 10“^. 
On termination, we upsampled the displacement fields obtained from both fluid 
registrations of neckD to necki and applied them to the original neckD images, 
to give full-resolution deformations. These are shown in Fig. 0 (top, centre). 

We applied the transformation flelds produced by both fluid tests to the intial 
spine volumes segmented from neckD; the results of the volume-rendered spines 
are shown from two angles in Fig. El The UF registration shows an extension of 
the upper two vertebrae of the spines on comparison to the original segmentation 
from neckD (Fig. El far right). 
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Fig. 8. {top) Central slices of full resolution neckD (left) and necki {right), {centre-left) 
UF and {centre-right) MF3 registration of neckD to necki. {bottom) Central slices of 
{left to right:) neckD - necki; UF neckD - necki; MF3 fluid neckD - necki 




Fig. 9. {left-right) original spineD; spineD after UF registration of neckD to necki; 
spineD after MF3 registration of neckD to necki with vertebrae weakly deformable; 
{far right)-, the upper three vertebrae after {left) UF and {right) MF3 registration. The 
3D images were volume-rendered using the Analyze package 



Figure E] shows logs of the Laplacian and elasticity energies as local defor- 
mation metrics computed from the displacements of both registrations of neckD 
to necki. The deformation metrics clearly show dark patches in the vertebrae in 
MF3 indicating low distortion: compare Fig. llOllfeft'l with those of the UF (Fig. 
ITHlrightl. 

5.4 Computational Time 

Solution of the fluid PDF in the spatial domain using finite differencing and 
relaxation is slow. We restricted the upper limit of the number of SOR iterations 
within each timestep to 40odll in the 2D case and to 50 in the 3D case. This 
provided a compromise between speed and accurate solution of the PDF, since 
within each timestep computation of velocities is approximate and is improved 
on in the next timestep. For images sized 32 x 128^, 50 SOR iterations took 
2 minutes 36 seconds for the constant-viscosity fluid on a Sun UltraSPARC - 

^ generally around 1300 were sufficient for the norms calculated from the residuals to 
drop to less than 0.1% of those calculated at the start of each iteration cycle. 
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Fig. 10. Central slices of the deformation metric images, after registration of neckD to 
necki (left) log of Laplacian, UF; {centre left) log of elasticity energy, UF; {centre right 
and far right) the equivalent for MF3 



equivalent to 416 minutes for 4000 iterations for a 64 x 128^ image. m have 
implemented full-multigrid (FMG) solution of the same PDE applied to the 
displacement field for elastic registration; they estimate 6,592 Jacobi relaxation 
iterations for a 64 x 128^ grid. For a grid sized J x J, SOR requires ^ the 
number of iterations compared to Jacobi relaxation [El- PI show that two 3D 
FMG cycles are sufficient for a 64 x 128^ grid to solve the elasticity PDE per 
iteration, giving 20 minutes per iteration. Hence we estimate that replacing SOR 
with FMG would speed the fluid registration by a factor of 20. 

MFl additionally multiplies each pixel by (0) once per timestep, a minimal 
overhead compared to the SOR iterations. MF2 provides a considerable gain in 
speed over UF, depending on the volume percentage of rigid bodies(pixels whose 
velocities are not computed). The percentage of passive regions in the image is 
equal to the percentage speed-up in the SOR solution. For MF3, extra finite 
differencing computations are added to the SOR due to the extra terms in O; 
we timed 3 minutes 47 seconds for 50 SOR iterations for images sized 32 x 128^. 

6 Conclusions and Discussion 

We have presented a new fluid deformation algorithm with variable viscosity 
for the registration of images containing structures with variable deformability. 
Results show the algorithm reduces the deformation of selected regions. The 
hierarchical strategy (registration within Gaussian scale space) was not optimal 
for the 3D case since it preferred initial registration at the strongest boundaries 
which were the (non-homologous) outer boundaries. We suggest instead a model- 
based hierarchy, using initial registration by an (automated) G-MTQ with rigid 
vertebrae, followed by MF3 for more localised deformations. 

Inhomogeneities in deformability can be extended to include anisotropies in 
the constraint parameters, such that there are preferential directions of deforma- 
tion. Anisotropies in ease of deformation are common in physical tissue such as 
muscle. Adapting the mathematical representation of the deformation of a physi- 
cal medium to allow anisotropies is more complex than only allowing for isotropic 
inhomogeneities; we leave such a possibility for future research. Another possi- 
ble amendment to the fluid registration is to supply pre-determined uniform and 
constant but non-zero velocities within regions defined as independently moving 
to allow rigid-body transformations within an overall fluid deformation. 
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Abstract. We introduce an approach to elastic registration of tomo- 
graphic images based on thin-plate splines. Central to this scheme is a 
well-defined minimizing functional for which the solution can be stated 
analytically. In this work, we consider the integration of anisotropic land- 
mark errors as well as additional attributes at landmarks. As attributes 
we use orientations at landmarks and we incorporate the corresponding 
constraints through scalar products. With our approximation scheme it 
is thus possible to integrate statistical as well as geometric information as 
additional knowledge in elastic image registration. On the basis of syn- 
thetic as well as real tomographic images we show that this additional 
knowledge can significantly improve the registration result. In particu- 
lar, we demonstrate that our scheme incorporating orientation attributes 
can preserve the shape of rigid structures (such as bone) embedded in 
an otherwise elastic material. This is achieved without selecting further 
landmarks and without a full segmentation of the rigid structures. 



1 Introduction 

Image registration based on point landmarks plays a major role in, e.g., neuro- 
surgery planning and intraoperative navigation. While rigid and affine schemes 
can only describe global geometric differences between images, elastic schemes 
can additionally cope with local differences. Reasons for local geometric differ- 
ences are different anatomy (or pathology), scanner- or patient-induced distor- 
tions, as well as intraoperative deformations due to surgical interventions. 

The most widely applied method for point-based elastic image registration 
is based on thin-plate splines. This approach has been introduced into medical 
image analysis by Bookstein |2I . Evans et al. |0| applied this scheme to 3D med- 
ical images. Thin-plate splines have a physical motivation, are mathematically 
well-founded, and are moreover computationally efficient. Alternative splines 
based on the Navier equation, which have been named elastic body splines, have 
recently been introduced by Davis et al. [Zj. Extensions of point-based elastic 
schemes which allow to include additional attributes at landmarks have been 
proposed by Bookstein and Green |5| and Mardia and Little The combina- 
tion of thin-plate splines with mutual information as similarity measure for the 

A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. 2.62- l2^ 1999. 
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purpose of refining initially coarsely specified landmarks has been proposed by 
Meyer et al. m- 

In all of these approaches from above the interpolation case has been treated. 
This means that corresponding landmarks are forced to match exactly and thus 
it is (implicitly) assumed that the landmark positions are known exactly. This 
assumption, however, is unrealistic since landmark extraction is always prone to 
error. Approximation schemes, on the other hand, allow to incorporate landmark 
errors. The error information is used to control the influence of the landmarks 
on the registration result, which is important in clinical applications. Also, the 
resulting computational scheme is more robust in comparison to an interpolation 
approach. However, it seems that approximation schemes have so far not been 
a focus of research (but see Bookstein [3, Rohr et al. m and Christensen et 
al. 1^ for exceptions). A more detailed discussion of these schemes is given in 
Section El below. 

This contribution is concerned with an approximation scheme for point-based 
elastic image registration using thin-plate splines. Central to this scheme is a 
well-defined minimizing functional for which the solution can be stated ana- 
lytically. Therefore, we yield an efficient computational scheme for determining 
the transformation between two images. In earlier work, we have introduced an 
approach that allows to incorporate isotropic as well as anisotropic landmark 
errors and we have proposed a scheme for estimating landmark localization un- 
certainties directly from the image data (Rohr et al. [1 711 t)) l. In this contribution, 
we suggest a generalization of our work which allows to integrate additional at- 
tributes at point landmarks. By this, additional knowledge is used to further im- 
prove the registration result without the necessity of specifying additional land- 
marks. In our case, we consider orientation attributes at corresponding points. 
Generally, these attributes characterize the local orientation of the contours at 
the landmarks. In previous work on the incorporation of additional attributes, 
Bookstein and Green E| have represented orientations by additional points close 
to the landmarks, thus they used a finite difference scheme. Mardia and Lit- 
tle El have proposed a scheme based on the method of kriging where exact 
orientations are incorporated. Their scheme requires the orientation vectors to 
be unit vectors. This imposes constraints which may not be desired. The ap- 
proach we propose also includes exact orientations, however, in comparison to 
HH the orientation vectors need not to be normalized to unit vectors. This is 
achieved by representing the constraints due to the orientations through scalar 
products. Additionally, we treat the interpolation as well as the approximation 
case. In particular, we propose a combined scheme that integrates isotropic as 
well as anisotropic errors together with orientation attributes. Also, we extend 
the domain of application of our scheme to the important case of preserving rigid 
structures (such as bone) embedded in an otherwise elastic material. It seems 
that this application has so far not gained much attention in previous work on 
point-based registration using attributes (but see Mardia and Little JD!). In 
comparison to other schemes such as Little et al. a full segmentation of the 
rigid structures is not necessary for our approach. 



254 



K. Rohr, M. Fornefett, and H. S. Stiehl 



The remainder of this contribution is organized as follows. In the next section, 
we discuss in more detail related work on approximation schemes for point-based 
nonrigid image registration. Then, we describe our approach based on thin-plate 
splines which integrates anisotropic landmark errors and orientation attributes. 
The applicability of the approach is demonstrated for synthetic data as well as 
real tomographic images of the human brain. 



2 Related Work 

In this section, we discuss approximation schemes for point-based nonrigid image 
registration. For other approaches to medical image registration we refer to a 
recent review by Maintz and Viergever nni. 

In 0, Bookstein proposed an approach to relaxing the original interpolating 
thin-plate spline approach by straightforward combination of different energy 
terms, where one term represents the bending energy of interpolating thin-plate 
splines and the other the distance of the landmark configurations (note, that in 
total four different energy terms have been proposed which may be combined). 
The basis of the approach is a linear regression model and the technique is 
referred to as ‘curve decolletage’ (Learner 0). With this approach it is possible 
to incorporate isotropic and anisotropic errors. However, since the approach has 
not been related to a minimizing functional w.r.t. the searched transformation 
it is generally not clear whether all solutions in the whole function space are 
obtained. The approach has been described for 2D datasets and experimental 
results have been reported for 2D synthetic data. The landmarks as well as the 
corresponding errors have been specified manually. 

In 1171 , we have introduced approximating thin-plate splines for elastic image 
registration. Our approach is based on the mathematical work of Duchon |Sj 
and Wahba izq which is a different mathematical framework in comparison to 
that in Bookstein |3|. The basis is a minimizing functional w.r.t. the searched 
transformation. The solution in the whole function space can be shown to be 
unique and can be stated analytically. While in HZ! we have treated the case of 
isotropic errors, in [11 HI1 jfj we have recently incorporated anisotropic errors for 
the landmarks in both of the images to be registered. Also, we have proposed to 
estimate the landmark localization uncertainties directly from the image data 
utilizing the Cramer-Rao bound (see JEj). The approach has been applied to 2D 
as well as 3D tomographic images of the human brain and the landmarks have 
been localized semi-automatically using differential operators. 

Recently, Christensen et al. p] introduced a hierarchical approach to im- 
age registration combining a landmark-based scheme with an intensity-based 
approach using a fluid model. The landmark scheme is based on the linear elas- 
ticity operator, thus the resulting splines are different from thin-plate splines. 
Another difference to our approach is that the nonafline part of the transforma- 
tion is separated from the affine part in their functional. The approach has been 
applied to the registration of 3D cryosection data of a macaque monkey brain 
as well to MR images of the human brain. Isotropic landmark errors have been 



Approximating Thin-Plate Splines for Elastic Registration 255 

included in one of the two images to be registered. Since no further details have 
been given on how the errors have been determined, it seems that equal isotropic 
errors have been used in their application. 

As already mentioned above, Bookstein and Green as well as Mardia and 
Little El introduced nonrigid registration schemes incorporating orientation 
attributes. These schemes are based on a finite difference scheme and the method 
of kriging, resp. In both of these works the interpolation case has been treated 
only, although a generalization to approximation is principally possible. 



3 Thin-Plate Splines with Landmark Errors and 
Additional Attributes 

We now describe our approach to elastic image registration based on thin-plate 
splines. This approach incorporates landmark errors as well as orientation at- 
tributes at landmarks. While the landmark errors represent statistical informa- 
tion about the uncertainty of landmark localization, the orientation attributes 
represent geometric information about the contours at the landmarks. Below, 
we first briefly review our scheme incorporating anisotropic landmark errors and 
then describe an extension for incorporating orientation attributes. 



3.1 Anisotropic Landmark Errors 

We denote the sets of landmarks in two images by Pi and q^, i = 1 . . . n, and 
the transformation that maps two images by u with components Uk,k = 1 . . . d, 
where d is the image dimension. The bending energy of thin-plate splines can 
be written as a function of the order m of derivatives in the functional as well 
as the image dimension d as 

d 

= ( 1 ) 

k=l 



where 






ai + ...+ad=m 



Oi! • • - Od! 






d^Uk 



accf • • • 



dx 



(2) 



according to Duchon 0, Wahba El- Under the necessary and sufficient condi- 
tion of 2m — d > 0 the functional is bounded. 

Anisotropic landmark errors are represented by covariance matrices Si. In 
this case the minimizing functional reads as 



1 

A(u) = - ^(qi - u(p,))^i:-^(q, - u(p,)) -H AJ^(u) (3) 

n 



and consists of two terms (see Rohr al [1 8^1^) V The first term measures the 
distance between the two landmark sets weighted by the covariance matrices 
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Si. The second term represents the smoothness of the transformation, and the 
parameter A weights the two terms. Special cases of this approximation scheme 
are interpolating thin-plate splines and optimal affine transformations. The ap- 
proach is applicable to arbitrary image dimensions d, e.g., 2D and 3D images. 
For the functional in © there exists a unique analytic solution, which can be 
stated as 



with monomials (j) up to order m — 1 and suitable radial basis functions U (see 
Wahba \Z l f22) . Wang ESI). The coefficients a = (af’, ..., a^)^, af = ..., 

and w = (w^, ..., w^)^, 'wj = (wi^i, ...,Wd,i) of the transformation u can effi- 
ciently be computed through the following system of linear equations: 



where W represents the landmark errors by W ^ = diagj.S'i, . . . , Sn} and is 
a block-diagonal matrix. The other matrices in are given by K = (Kijld), 
where Kij = t/(pi,pj) and is the d x d unity matrix, and P = (P^Td), 
where = (l)j{Pi). The vector v can be written as v = (vf , ..., v^)^, vf = 

Note, that our approximation scheme using covariance matrices is also a 
generalization of the work in Bookstein 0 , where the interpolation case is solved 
while the landmarks are allowed to slip along straight lines within a 2D image. 
Actually, this is a special case of our approximation scheme since for straight 
lines the variance in one direction is zero whereas in the perpendicular direction 
it is infinite. 

3.2 Landmark Errors and Orientation Attributes 

The approach described above can further be generalized for inclusion of addi- 
tional attributes at landmarks. In our case, we incorporate orientation attributes. 
These attributes characterize the local orientation of the contours at the land- 
marks and represent additional knowledge for elastic image registration. 

At corresponding landmarks we assume to have orientations which we want 
to match (note, that these landmarks are generally a subset of the overall land- 
marks). We denote those landmarks in the first and second image by pg^ and 
qg. and the corresponding orientations by d^ and e^, resp. To define a matching 
criterion between the orientations, we need the transformed vector of d^. This 
vector can be stated as (d^V)u(pgJ. Now we require that this transformed 
vector is perpendicular to which are the k-th orthogonal vectors to the ori- 
entation vector 6i in the second image. In this case, the scalar product between 
the vectors is zero, otherwise it is different from zero. Choosing vectors from 
the orthogonal space has the advantage that the corresponding scalar product 



M 



n 





{K + nXW-^)w + Pa = V 

P^w = 0, 



( 5 ) 
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is zero independently of the length of the vectors. This is an advantage over the 
approach in Mardia and Little El (see also Mardia et al. El), where the orien- 
tation vectors are required to be unit vectors and where the interpolation case 
has been treated only. In our work, we both treat the interpolation as well as 
the approximation case. Note, however, that the property of length independence 
only holds in the case of interpolation, but not for approximation. In general we 
have d — 1 perpendicular orientations which constrain the orientation of the 
transformed orientation vector of the first image to lie on a line. If the number 
of perpendicular orientations is smaller, i.e., the number of constraints is lower, 
then the orientation of the transformed orientation vector is not constrained 
w.r.t. a line, but w.r.t. a plane, for example (see also Fornefett et al. fTDjL 

Having defined the matching criterion between orientations we can now state 
the generalized minimizing functional using — u(pi) as 

- n ^ ne d— 1 

+ ( 6 ) 

Tl Tin 

1—1 ^ 1—1 k—1 

where n '2 = U 2 f c, c > 0, and ri 2 = ng(d — 1). In comparison to the functional 
(0 from above we have an additional term that incorporates the orientation 
constraints, ng is the total number of orientations in each of the images. The pa- 
rameter c weights the orientation term w.r.t. the term representing the landmark 
errors and also determines (besides A) whether we interpolate or approximate the 
orientations. Note, that we can incorporate an arbitrary number of orientations 
at each landmark. As described above, the orientation constraints are incorpo- 
rated by scalar products between the transformed orientations of the first image 
and orientations perpendicular to the orientations in the second image. The 
solution to the functional in 6 can be stated as 

M d n d 

u(x) = EE ak,v4’y{'X.)Sk + EE tCl.fc.iC^(x,p*)£fc 

v—lk—1 i—1 k—1 

riQ d—1 

“EE^2.fc.*(dfV)C7(x,peJe,^fc, (7) 

i—1 k—1 

with monomials (j) up to order to — 1 and radial basis functions U as above, e^, 
k = 1 . . . d, are the canonical basis vectors of the IR^^. The solution is analogous to 
from above, but additionally we have a term that represents the orientation 
constraints. Note, that in order to obtain bounded functionals the used function 
space has to be constrained. Choosing to = 2 for the order of derivatives of the 
smoothness term, then for both cases of 2D and 3D images (d = 2, 3) incorporat- 
ing orientations, we have the basis function C/(x) = |x|^. The parameter vectors 
a= (af’,...,a^)'^, af = (ai,i, ..., and w = Eyi, 

..., W 2 ,d-i.i) of the transformation u can 

be computed by solving the linear system of equations 

Kw -I- Pa = V 
P^w = 0, 



( 8 ) 
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with 



/Ki+nAW-i K2 

VK 3 K4 + n'2AI„2 




( 9 ) 



where = diagj^i, . . . , S„} as in 0 and 1„2 is the ri2 x n2 unity matrix 
with 71-2 = ng^d—l). The other matrices in @ are given by Ki = where 

^i,ij ~ U{Pi,Pj) and 1(7 is the d X d unity matrix; K2 = {K2^ijF j), where K2^ij = 
-{dJW)U{pi,pe-), Fj^ki = and Fj are d x (d — 1) matrices; K3 = K|’ ; 

K4 = {Ki^ijF.j), where = -(dJV)(df V)C/(pe,, pej, Eij^ki = 
and Fij are (d — 1 ) x (d — 1 ) matrices; Pi = (Piyjld), where Pi^ij = (j)j{pi), 
and P2 = (P2,ijFf), where P2,^j = (df V)(/)j (pej. K and P are of dimension 
n' X n' and n' x dM, resp., with n' = nd + ng{d — 1). The vector v is given by 
V = (v[, v^,0, vf = (q,^i, ...,qi^d), with ri2 zeros at the end. 



4 Experimental Results 



We demonstrate the applicability of our approach using synthetic data as well as 
real tomographic images of the human brain. In the first two experiments we have 
incorporated either anisotropic landmark errors only or orientation attributes 
only. For the last two experiments we have integrated both landmark errors 
(isotropic as well as anisotropic errors) and orientation attributes. 

In the first example, we register the 2D MR brain images of different patients 
displayed in Fig. O We have used normal landmarks and quasi-landmarks. The 
quasi-landmarks have no unique position in comparison to normal landmarks, 
e.g., arbitrary edge points. The incorporation of quasi-landmarks is important 
since normal point landmarks are hard to define at the outer parts of the human 
head. For all landmarks the covariance matrices have been estimated directly 
from the image data by utilizing the Cramer-Rao bound 



S 



g ~ 




(10) 



where tr^ denotes the variance of additive white Gaussian image noise, m the 
number of voxels in a local 3D window, and Cg = Vg (Vg)'^ is the averaged 
dyadic product of the image gradient (Rohr , van Trees |2D|). Note, that the 
Gaussian noise model is an approximation and that we assume that the depen- 
dence of the noise on the signal can be neglected (but see Abbey et al. PJ). In 
Fig. □ the landmark localization uncertainties are represented by error ellipses 
(note, that the ellipses have been enlarged by a factor of 7 for visualization pur- 
poses). It can clearly be seen that for the normal landmarks the localization 
uncertainty is small in all directions, while for the quasi-landmarks (landmarks 
no. 9-12) the localization uncertainty is large along the edge but small perpen- 
dicular to it. Fig. El on the left shows the registration result when using only 
the normal landmarks for elastic image registration (landmarks no. 1-6 and 8). 
Here, we have applied our approximating thin-plate spline approach while in- 
corporating isotropic errors and setting to = d = 2 in 0. We have transformed 
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Fig. 1. MR data sets of different patients: normal landmarks, quasi-landmarks, and 
estimated error ellipses (enlarged by a factor of 7) 




Fig. 2. Registration results: Thin-plate spline approximation using normal landmarks 
along with equal scalar weights (left), and using normal landmarks, quasi-landmarks 
and estimated covariance matrices (right) 



the first image and have overlayed it onto the computed edges of the second 
image. While the registration accuracy within the inner parts of the brain is 
quite good, at the outer parts there are larger errors. If instead we use both 
the normal landmarks and the quasi-landmarks while incorporating anisotropic 
errors, then we can significantly improve the registration accuracy as shown in 
Fig. El on the right. 

With the second example we demonstrate the usefulness of incorporating 
orientation attributes at landmarks. With the two synthetic images in Fig. 0 
we simulate the rotation of a rigid structure (such as bone) embedded in an 
otherwise elastic material. If we use point landmarks only (four landmarks at 
the rigid structure and four landmarks at the image corners), then we obtain 
the result shown in Fig.0on the left. We see that the whole image including the 
rigid structure is elastically deformed. Next, we have incorporated orientations 
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Fig. 3. Synthetic images simulating the rotation of a rigid structure in an otherwise 
elastic material 




Fig. 4. Registration results: Interpolating thin-plate splines using only point landmarks 
(left) and incorporation of orientations at landmarks (right) 



at the landmarks of the rigid structure. In all our experiments incorporating 
orientations we used c = 1 and m = 2 for the functional in At each of the four 
landmarks of the rigid object in Fig. 0 we have specified two orientations which 
are aligned with the contours of the object. Using this additional knowledge for 
image registration significantly improves the result, i.e., the shape of the rigid 
object is well preserved (Fig. 0 on the right). Previously, Little et al. JO] have 
considered the problem of preserving rigid structures within elastic material. 
However, in their approach a full segmentation of the rigid structures is necessary. 
With our scheme we neither needed a full segmentation nor have we needed 
additional point landmarks. 

In the third example we treat the case of several rigid structures embedded 
in elastic material. Fig. 0 shows two synthetic images that simulate the bending 
of a spine which is represented by five rigid components (see also ^5)- The 
registration result in Fig. Elon the left is obtained if we apply interpolating thin- 
plate splines while using four landmarks for each rigid component as well as four 
image border landmarks. In Fig. 0in the middle the result is shown if we include 
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two orientations at each landmark of the rigid components, while still applying 
an interpolation scheme. It can be seen that the shape of the rigid structures is 
better preserved, particularly the outer contours of the rigid components are not 
curved as in the case of using point landmarks only. A further improvement is 
obtained if we use both the point landmarks and the orientations but apply an 
approximation scheme (Fig. 0 on the right). Here we have used equal isotropic 
landmarks errors in the functional in 0 . From the result it can be seen that the 
contours of the rigid components are straight and now also the gridlines within 
the rigid components are nearly straight. Thus the shape of the rigid structures 
is better preserved. 

With the last example we show an application where we have integrated 
both anisotropic landmark errors and orientation attributes. In Fig. 0 two MR 
images of different patients are shown. We have selected normal point land- 
marks and quasi-landmarks, and we have estimated the error ellipses directly 
from the image data. If we use only the normal landmarks (9 landmarks; no. 
1,2,4,7,10,11,16,17,18) and apply interpolating thin-plate splines, then we ob- 
tain the result shown in Fig. 0on the left. Deviations can be observed in the 
regions where no landmarks have been specified, particularly at the upper part of 
the brain and at the corpus callosum. Next, we have used the normal landmarks 
from above together with three quasi-landmarks at the skin contour (landmarks 
no. 25,26,27). For all landmarks we have automatically estimated the covariance 
matrices and we have applied the approximating thin-plate spline approach in- 
corporating anisotropic errors. From Fig. 0 on the right it can be seen that the 
registration accuracy at the upper part of the brain is now much better while at 
the corpus callosum there is still a larger deviation. We can further improve the 
result in this region if we additionally integrate orientations at landmarks. In this 
example, we have included one orientation at landmark no. 1 (genu of corpus 
callosum) . In both images this orientation points to the top of the corpus callo- 
sum. From Fig. 0we see, that we now obtain a significantly better registration 
accuracy of the whole corpus callosum. 

5 Summary and Future Work 

In this contribution, we have proposed an approach to elastic registration of 
medical images that is based on point landmarks and additional attributes. Our 
scheme is based on a minimizing functional which covers the full range from 
interpolation to approximation. Since the solution can be stated analytically we 
yield an efficient computational scheme. Central to this work is the integration of 
anisotropic landmark errors and orientation attributes at landmarks. By this we 
incorporate statistical as well as geometric information as additional knowledge 
in elastic image registration. We have demonstrated that this additional knowl- 
edge can significantly improve the registration result. In particular, we have 
shown that by incorporating orientation attributes it is possible to preserve the 
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Fig. 5. Synthetic images simulating a spine that is bended 






Fig. 6. Registration results: Interpolating thin-plate splines using only point landmarks 
(left), integration of two orientations at each object landmark (middle), and approxi- 
mating thin-plate splines using point landmarks and orientations (right) 
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Fig. 7. MR images of different patients: normal landmarks, quasi-landmarks, and es- 
timated error ellipses 




Fig. 8. Registration results: Interpolating thin-plate splines using normal landmarks 
(left), and approximating thin-plate splines using normal landmarks, quasi-landmarks 
and estimated covariance matrices (right) 




Fig. 9. Registration result: Approximating thin-plate splines using normal landmarks, 
quasi-landmarks, estimated covariance matrices, and orientations 
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shape of rigid structures (such as bone) in an otherwise elastic material. This can 
be achieved without selecting further landmarks and without a full segmentation 
of the rigid structures. 

One problem with our approach is that the influence of incorporated orienta- 
tions is rather global, i.e., image parts further away from the positions of added 
orientations are often strongly affected, which is generally not desired. This ob- 
servation has already been made earlier (see Mardia et al. EH)- In future work, 
means have to be found to constrain this global influence. Another topic for 
further research is the automatic estimation of the orientation attributes. While 
for rigid structures within elastic material the local orientation of the contour 
seems to be quite appropriate, for elastic material other choices which rather 
reflect the global geometry of anatomical structures, seem to be better suited. 
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Abstract. In this paper we describe a statistical model for the observa- 
tion of labeled points in gated cardiac single photon emission computed 
tomography (SPECT) images. The model has two major parts: one based 
on shape correspondence between the image for evaluation and a refer- 
ence image, and a second based on the match in image features. While 
the statistical deformation model is applicable to a broad range of image 
objects, the addition of a contraction mechanism to the baseline model 
provides particularly convincing results in gated cardiac SPECT. The 
model is applied to clinical data and provides marked improvement in 
the quality of summary images for the time series. Estimates of heart 
deformation and contraction parameters are also obtained. 



1 Introduction 

In the SPECT modality, a patient is injected with a radiotracer compound and 
an image is recorded based on using photons emitted from the tissue where the 
compound accumulates. In traditional SPECT, a single 3D volume is imaged. 
However, when applied to cardiac imaging, motion artifacts result in a serious 
degradation of the reconstructed image. To overcome this problem, gated cardiac 
SPECT can be used to divide the image acquisition period into n subsegments 
or gates based on the patient’s electrocardiogram. If this is done, n SPECT im- 
ages are acquired in parallel over several heart beats, one image corresponding 
to each gate. This imaging technique provides a useful diagnostic tool for direct 
evaluation of heart tissue damage, since the image artifacts due to the motion of 
the heart are much reduced. However, its success depends heavily on the com- 
pilation of data across gates, since each gated image is based on a relatively low 
number of photon counts (compared to the traditional approach). A reasonable 
summary of the n images can only be be obtained once the physical deformation 
of the heart through time is accurately modeled. The resulting summary image 
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will then ideally have a signal-to-noise ratio (SNR) comparable to the traditional 
ungated SPECT images but with minimal motion blur. 

We propose a method for modeling and summarizing gated cardiac SPECT 
data by tracking anatomical points within the heart volume through the series 
of gated images. Based on this tracking, a composite image can be calculated 
by summing each gate’s image intensity at voxels dictated by the estimated 
path of each anatomical point through time. This procedure offers the potential 
of combining the benefits of better delineation of the heart walls with more 
accurate estimates of the tissue perfusion as measured by radiotracer compound 
uptake. 

Several previous approaches have been taken in similar problems. Klein et. 
al. HOI have applied an approach which matches gated positron emission tomog- 
raphy (PET) images based on image values, smoothness of the motion field and 
physical incompressibility. Tagged magnetic resonance imaging has also been 
used to identify specific regions of heart tissue and match those through a time 
series m- There exist a variety of other approaches to modeling heart deforma- 
tions in medical images based, for example, on surface matching ll7l4l . Several 
“ground truth” studies for determining heart tissue motion have been carried 
out, for example by Potel et. al., who m have performed marker-based direct 
measurements of the motion of surface points, and numerical phantoms of the 
heart have been used in dose calculations and simulations )| . Similar ap- 

proaches for deformation modeling to those offered here have been successfully 
applied to the case of finding image features, for example, using active shapes 
0, snakes |B|0|, and landmark-type methods Emiiij. The deformation method 
as described for a single image is comparable to the work of Collins et. al. jS], in 
which a varying-resolution grid is used to register MR brain images to an atlas 
by balancing constraints on grid continuity and a local feature function match. 
A similar multiresolution approach was applied in the work by Klein et. al. PH. 
The multiresolution approach to maximization (and the description of images in 
general) is well-documented in the literature, particularly within the scalespace 
framework (see e.g. H2j or 1221) and work in the context of optical flow pm]. 

Our general facet model approach, previously described in m and O, is 
perfectly suited to gated cardiac SPECT since it provides a method for calculat- 
ing the deformations required to trace a set of anatomical points through a set 
of images. Facet models are based on a large number of landmark-like points, 
termed facets, acting on a set of images drawn from a common class. Facets 
combine ideas from many of the approaches cited above in a framework which 
is intended to model observer placement of an arbitrary number of points in an 
image based on knowledge of those points in a reference image. This is accom- 
plished via a probability distribution defined on facet locations and on image 
feature values at the facet locations. 

In this paper, we extend the facet model to incorporate a gross model for 
heart contraction by including a set of contraction parameters in the shape 
portion of the probability distribution. Facet motion results are obtained by 
maximizing the joint distribution on facet locations for each image in the series. 
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Level (1-1) 



Level (1-1-1) 



Level 1 



Fig. 1. Basic model element (for image dimension d — 1). Each facet has an associated 
parameter pair positions in the reference image and an image for evalnation, 

respectively. The facet can also contain the pair {(f>, f ) which represent a reference image 
feature and the image feature observed at the estimated facet location x, respectively. 
Facets are modeled jointly in a hierarchy (see Sect. 2.1), where they are infinenced 
directly by 2'^ facets one level up and in turn influence 4'^ facets on the next level down 

Several methods for exploring the high-dimensional results are exhibited, in order 
to show the utility of the method for the gated cardiac SPECT application. 

2 Model and Methods 

In the following subsections, the statistical model is defined. First, the general 
facet model is briefly reviewed. (For more details, see P^.) This model is gen- 
erally applicable to a broad range of image modalities, however, we introduce 
several model extensions specific to the cardiac gated SPECT application based 
on a priori knowledge of general heart motion. Finally, the details of implemen- 
tation are discussed. 

2.1 General Model 

The process that we wish to model is the placement of labeled points, or facets, 
within an image by a human observer. Facets differ from landmarks (following 
Bookstein|E]) in that facets do not correspond to specific pre-deflned anatomical 
or mathematical features. Instead, each facet’s label is generally inferred by its 
location in a reference image. A set of facets is applied hierarchically (see Fig. 
[3 to capture deformation on several levels of coarseness as inspired by models 
of visionP3|. Each facet has an associated position x, and may also have an 
associated feature value /, depending on its location in the hierarchy (see Sect. 
2.1.1). A joint distribution is defined for all facet positions and feature values. 
We model the vectors x and / as conditionally independent given a parameter 
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vector 9 = {9x,9f}-, that is 

p{x, f\9) = Ps{x\&x)piU\Of)- (1) 

The parameter vector 9^ contains location and scale parameters {/x, k, cr^} for 
the facet location (shape) portion of the distribution, and 9j similarly contains 
parameters {<)>, t} for the feature (image) part of the distribution. These param- 
eters are described below. 



Shape Distribution Let x be the vector of facet positions. Let Xi indicate the 
vector of facets at level I in the hierarchy and let xij refer to an individual facet 
j in that level. Similarly, let p be the vector of corresponding facet locations 
in the reference image T. The distribution ps{x\9x) is then assumed to have a 
hierarchical normal structure defined by (0 and 0 . Each level has Ni facets, and 
d is the dimension of the image. For (L-fl) levels in the hierarchy, I G {0, . . . , L}, 
define 



Ps{x\9x) = ps{xo\&x)ps{xi\xo;9x) ■ ■ .ps{xL\xL-i;dx) , (2) 



where each factor is a density of the form 

Ps{xo\K,fJ.) = MVN(/Xo,KtTQld) 

ps{xi\xo; K, p) = MVN(/xi -|- AiAxo, 



Ps{xl\xl-i',k.,9) = MVN(/ii AiZ\a;L_i,K(T^lArid) . 



(3) 



Here, MVN(a, if) denotes the multivariate normal density with mean a and 
covariance matrix E. The vectors xi and have length Nid, Axi = xi — pi, 
In is the n by n identity matrix, and Ai is an Nid by Ni-id design matrix 
for the hierarchical model. For example, in one dimension, A might be defined 
schematically as follows (refer also to Fig. 0, 
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where the Wj2 values are set proportional to the inverse distance between pij 
and constrained by J2k ^ closest facets 
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on level {I — 1) to facet Ij are given non-zero weights w^2- the 3-dimensional 
application presented here, such entries in a row of Ai are thus kept limited to 8 
by extending the above Ai in the most obvious way to 3 dimensions. Each level in 
the hierarchy is laid out such that ni forms an evenly spaced grid with Ni = 2*“^ 
facets per level. Finally, is the conditional variance of a facet on level I given 
the locations of facets on level {I — 1). These parameters were set such that the 
marginal variance for bottom level facets is approximately independent of the 
number of levels in the hierarchy. The parameter k is an overall scale factor to 
allow for adjustments in the weighting of shape (ps) versus image (pi) portions 
of the density in (^. This parameter was set empirically. 

The form for ps captures the deformation on several levels of scale, thus eas- 
ing the exploration of configuration space. This means that gross deformations 
are modeled by upper-level facets and that lower-level facets will not be subse- 
quently penalized for the same movement. The form chosen for Ai enforces some 
smoothness on the deformation, i.e., the marginal covariance between any pair 
of facets on a level I is a smooth decreasing function of their distance (pij — pij> ) 
in the reference image EH- This is unlike previous models in which only one 
non-zero term per row was used 1 1 I j. Computational tractability is ensured by 
the choice of a normal hierarchical model. 



Feature Distribution Let / be the vector of image-derived feature values 
associated with the set of facets. Similarly, let 4> be the corresponding vector of 
reference image feature values. Then given an image match function g, we model 
the facet features as drawn from the exponential family distribution given in @. 
Here, we further assume that if are the corresponding jth element of 

the vectors {/,</>}, then the feature distribution pj is modeled as a product of 
univariate distributions with a common image match function gj. Thus, 



Piifh, (f)) oc exp {-g(/, (f))} 



OC exp 




Nl 

9 1 ifj ; 'Pj ) 

i=i 



( 4 ) 



where r is an overall scale parameter which is similar to k defined for the shape 
distribution ps- The sum extends over those facets that have associated feature 
values, which in this paper are the facets on the lowest level L in the hierar- 
chy. The means (f>j are taken to be the image feature value calculated at the 
locations pLj in the reference image T: (pj = (p(T,fiLj). Note that this does 
not necessarily imply taking the image value at p^j directly. The derivation of 
{fj^pj} from the image data can be specified in a number of ways; we have 
used either a quantile-rescaled image intensity or a low-scale image Laplacian. 
For choices of the functions g, scaled squared differences between fj and pj 
(yielding independent normal distributions) have been used |15in| . and a local 
intensity regression, as described in the next subsection, has also been employed. 
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2.2 Application-Specifics 

The baseline shape model O captures the shape changes in the general case 
when little prior knowledge is available about the shape change within the image 
class. Since we know a priori that the heart size changes during the heart beat 
cycle, it is sensible to build this into the model. We choose a simple scalar 
correction, due in part to the coarseness of the images. The left ventricular 
(LV) wall is approximately 3 pixels wide across in a 2D slice. Also, non-linear 
contractions not incorporated into the gross model can still be accommodated 
by the deformation model. 

To define the contraction model, let 71 be coordinates for a center of con- 
traction, and let 72 be a 3-dimensional set of contraction factors for orthogonal 
directions {1,2,3}. The full vector 7 = {71,72} transforms fi in the shape por- 
tion of the model (0 to a vector of contracted means at time t in the image 
time series, and the shape distribution ps is modified accordingly : 

ps{xi\xi-i;K,p,^{t)) = + AiAxi-i, KaflNid), 

= 7lljV,d + (Mi - ll'^Nid)'f2{t) , 

where 72 is a stacked vector of A; replicates of 72 and is the n-dimensional 
vector of ones. Otherwise, Ai, naf and p remain unchanged. A prior distribution 
can also be included to capture the expected contraction pattern during the beat 
cycle, for example 

P{l2{t)) = MVN(w(f),i/"^l3) , {72(^):7|(^),72(^)} e< 0 ,oo > , (6) 

yielding the final form for the joint shape distribution, 

Ps{x,-l2\0x) =Ps{x\e^,-f2)p{-^2) ■ (7) 

The center of contraction 71 is fixed in this implementation at p-o, since having 
it vary for a scalar contraction only involves a non-informative translation of 
the reference grid. Note that the introduction of does not change the values 
of 4 >j, which are still taken as the reference image feature values at pLj- (For a 
graphical outline representation of the model, see Fig. 0) 

The feature function gi found to be most effective for gated cardiac SPECT 
images is one based on a local intensity regression around the facet in question. 
A small neighborhood defined by a set of m points is placed around the facet’s 
position in the reference image T (around pLj) and observed image Q (around 
XLj). Subsequently, these points are evaluated in T and Q, respectively, to form 
TO- vectors 4 >j and fj, indexed by k. A normalized regression parameter (see m 
for details) is then calculated, 

_ ^ (Sfc 4'jkfjk — ~ 4>jk fjk)^ (^) 

(Sfc 'P'jk ~ m(Sfc 4>jk)^){J2k fjk ~ m(Sfe fjkY) 

which is then used to define the distribution function p/ (0) . 
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Fig. 2. Conceptual model framework. The facets at locations p are placed in the ref- 
erence image to obtain the feature vector 0. The deformation is then modeled as a 
contraction of the p to and a deformation from to the predicted facet positions 
X. At the same time, the deformation is evaluated for image feature match between <j> 
(fixed) and / at the facet locations x\ f = f{Q,x) 



2.3 Implementation 

As described above, one goal of this methodology is to provide a computationally 
tractable method for estimating observed facet locations. This maximization is 
relatively straightforward in the framework of iterated conditional modes (ICM), 
in which each parameter is updated by setting it to the mode of its full condi- 
tional distribution 12]. 

Since our interest lies in the facet positions x, we treat the image Q which is 
to be evaluated as a constraint on the model, thus imposing / = f{Q,x) given 
the image Q (details in m, Appendix A) . The resulting constrained distribution 
on facet locations x in an image Q is proportional to namely 

p{x\0) o(.psix\9a;)pi{f = f(Q,x)\ef) . (9) 



Numerical maximization is required when the locations of facets on the lowest 
level are predicted, since the pi factor in o introduces a non-standard distri- 
bution on x under this constraint. Maximization steps over the full conditional 
distributions for upper- level facets have closed form solutions. The full condi- 
tional mode for facet position xij on a level I in the hierarchy, for I not equal to 
the top or bottom level {I ^ {0, L}) is given as 



kGPij = kGDij 

^ik = {{x(i+i)k - ’>^j'k{xir - fj-lp)}, 



<7 „ (J 7 






E(' 



^kj 



r- 
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In this expression, k indexes facets on level (^ — 1) or (/ + 1) in the set Pij or D[j, 
respectively, of those facets which contribute to the means in the full conditional 
distribution on xij. The prime notation (') indicates facet index or parent set 
relating to one of the children of facet Ij. The top level {I = 0) facet has a similar 
form to C3, dropping terms relating to the set Pij . 

The full conditional distribution for the contraction factor 72 also has normal 
form, with the mode given in (El)- Assume a normal distribution truncated at 
zero with mean to and precision v for the prior ^(72)- Then 



7/CM 



Eti sfii 



( 11 ) 



h^Pij k^Pij 

and as before, k is the index for a facet in the set P 13 which contribute non-zero 
terms to the full conditional distribution on xij. 

For the ICM maximization steps involving the lowest-level facets, the Nelder- 
Mead simplex method is applied El- To enhance computational efficiency, our 
maximization approach involves partial maximization of the upper levels with 
relations between lower-level facets kept fixed, and with approximate image fea- 
ture contributions calculated based on the scale-space |T2| of the observed and 
reference image. 



3 Results 

The method described was applied to a dataset from Duke University Medical 
Center consisting of 16 images acquired during the heart beat cycle. For each 
gate, an image of size 64x64x16 voxels was acquired (7.1 mm voxel size). The 
heart was contained entirely in a 16x16x16 voxel volume. Using a 5 level facet 
hierarchy, each voxel in the reference heart volume contained one bottom-level 
facet, located at the voxel center. The entire hierarchy spans a 16^ cube at five 
different resolutions {L = 4) and has a total of 4681 facets. Gate 8 (filling phase, 
mid-diastole) was used as the reference image throughout. The resulting density 
on facet locations x was then maximized for each of the other gates individually. 
Typical maximization time was approximately 3 minutes per gated image on a 
DEC 433au workstation. 

The results are summarized as follows. First, we show plots of facet move- 
ment from slices in the reference image to slices in another gated image for 
three orthogonal directions. Next, several individual facets are displayed on the 
reference and the other gated image to demonstrate the deformation achieved 
under the model. We then display a composite image and compare it to the 
traditional SPECT image and a single gated image. Difference images for the 
composite versus the traditional and gated image are also shown. Subsequently, 
estimated changes in overall size (contraction) are shown for the time series. 
Finally, convergence and stability relative to initial condition is examined. 
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3.1 Facet Motion 




Fig. 3. Facet motion from gate 8 (reference;mid-diastole) to gate 4 (end-systole) in 
representative slices. Panels (a)-(c) show a transaxial plane, (d)-(f) a coronal plane 
and (g)-(i) a sagittal plane. The left column shows the reference image slice, the middle 
column shows the facet motion estimate and the right column shows that same estimate 
superimposed on the image slice from the gate 4 image 



In Fig. 0 facet displacement vectors from gate 8 to gate 4 are shown for a 
representative slice in three orthogonal directions. Gate 4 corresponds to the 
contracted state (end-systole). The general contraction from diastole (gate 8) to 
systole is clearly captured by this estimated mode of the joint density on facet 
locations. Note also that though the grid has contracted, there are also regions 
on the heart which have not moved significantly. This is consistent with typical 
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Fig. 4. Selected facet positions in transaxial slice 8 of the reference image (a), with 
corresponding estimated facet positions in the gate 4 image, slice 8 (b) and 9 (c). 
Note the contraction and deformation in relative positioning without losing relations 
between neighboring facets, as well as the slice-jump, indicating fully 3D deformation 



heart motion. Thus, the model achieved two of the objectives stated earlier; there 
was an overall size correction as well as a local deformation. 

The positions of several individual facets in the reference image and gate 4 
are shown in Fig. 01 Points in the heart are estimated under the model to deform 
in a complex manner, and are consistent with the heart shape seen. Note also 
the fully 3-dimensional nature of the deformation as evidenced in the slice-jump 
of the upper leftmost section. 

3.2 Summary Images 

To more accurately represent the distribution of radiotracer uptake in the heart, 
a facet-composite (composite for short) image was calculated. The maximization 
for facet placement in all gated images was consolidated into this composite 
image by mapping the gated image intensity found at the facet position in each 
image to that facet’s reference image position and averaging across the image 
sequence; where Qt is the t-th 

image in the time series of n images and is the gate used as the reference, 
i.e. Qt^ = T. This image (Fig. 0(a)) compares favorably to both of the other 
representations of the data, the voxel-wise mean (standard SPECT equivalent. 
Fig. Ob)) and the gated image alone (Fig.Oc)). Image intensity uniformity has 
also been improved in the LV wall region relative to the gated image alone, while 
retaining image contrast. Comparing the facet-composite with the mean image 
shows a better delineation of the lateral wall of the left ventricle. The composite 
image thus represents a specific state of the heart (here, it maps to mid-diastole) 
rather than a time-averaged state which does not exist. The difference images 
shown in Fig.Elhighlight the structural differences between the summary images. 
The mean image minus the composite image (pixel-wise difference) shows a clear 
pattern (dark and bright) that corresponds to the lateral wall of the left ventricle. 
Again, this corresponds to known heart motion. Also, when the composite image 
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Fig. 5. Slice 9 (transaxial) : (a) Composite image mapped to gate 8, (b) mean voxel- 
wise image (standard SPECT), (c) gate 8 image only. The composite image was formed 
by computing average intensity based on the facet motion through the series of images 
and mapping back to the reference image (gate 8). We observe similarity between gate 
8 and composite, with the composite having superior smoothness in regions of activity. 
Furthermore, a better spatial delineation of the heart wall in the composite image 
relative to the standard SPECT image is seen 




Fig. 6. Difference images corresponding to the images in Fig. |3 (a) mean minus com- 
posite (mapped to gate 8), (b) gate 8 minus composite. For detailed explanation of the 
composite image, see Sect. 3. Here we see clearly the structural difference between the 
standard (mean) image and the composite facet-based image. The regions of dark and 
bright indicate that the deformation model has shifted intensity outward for the lateral 
wall of the left ventricle (arrow). This is in accordance with the use of gate 8 (mid- 
diastole) as the reference. The gated versus composite comparison shows no structural 
differences other than an overall intensity level difference in the heart region, which is 
attributable to a known intensity trend discussed in the text 
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Fig. 7. Estimated overall contraction through cycle, measured in percent of total vol- 
ume spanned by /Xc relative to fi. This corresponds well to the expected and observed 
heart contraction through the series of images. It is not interpretable as heart chamber 
volume, however 
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is subtracted from the gate 8 reference image, no pattern other than the known 
intensity difference between an average and any individual gate late in the time 
series is apparent. (Early gates tend to have more intensity due to an out-of- 
phase blurring effect which worsen towards the end of the beat cycle.) 

3.3 Contraction 

A relatively non-informative prior distribution on 72 (uniform on < 0,3 >) was 
used for the results reported here. Figure [ 7 | shows the contraction correction 
as a function of time (gates) when taken as an overall volume change. The 
parameters behave sensibly through the cycle: Gates 1-6 comprise the relatively 
short contraction phase (systole), while the remaining images are acquired in the 
expanded or dilated state (diastole) of the heart. The parameter time evolution 
tracks this. Since this is an overall correction, this parameter could be interpreted 
as a rough indicator of relative heart size, but it should not be used as a measure 
of particular quantities, such as LV chamber volume. The general trend shown 
in Fig. □ matches well with visual inspection of changes in heart size over the 
image series. 



3.4 Stability 

Each full ICM cycle includes an iterative maximization over facet locations x, 
followed by a maximization for 72. The model was allowed to run for 200 such 
cycles, and did not exhibit any significant changes in parameter estimates or facet 
locations from the values determined with a shorter run (5 full cycles). Previous 
work PI has shown fast convergence of the maximization for a model which 
does not incorporate contraction (7) directly. Finally, several starting positions 
for the maximization routine were used without changing the final results. 
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4 Discussion 

The model proposed in this paper improves the utility and applicability of gated 
cardiac SPECT data. While future modifications to the model are being inves- 
tigated, current results are promising. Taken together, the set of locations for 
every facet in every gate constitutes a very rich representation of this image 
data. With further investigation of true anatomical correspondence, this repre- 
sentation will offer new diagnostic ways to look at heart function abnormalities 
via estimated deformations rather than based solely on radiotracer uptake. The 
facet-composite image is also a clearly improved summary of the image time 
series over the voxel-wise sum and offers better intensity uniformity in the heart 
region and from that a better SNR than the individual gated images. In the 
future, we plan to evaluate numerically this improvement in SNR and the accu- 
racy of estimated facet locations using a newly developed Monte Carlo computed 
phantom of a beating heart in a thorax EDI- This is important since we currently 
have no clinical data available to evaluate the real motion of individual heart 
tissue elements for these time series. With such reference data, numerical eval- 
uations of performance and subsequent educated model modifications will be 
possible. More advanced modeling based on known physiology of heart contrac- 
tion, use of smoothness constraints on individual facet motion and the inclusion 
of registered and simultaneously acquired transmission computed tomography 
(TCT) data are current model extensions under investigation. 
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Abstract. A method has been developed for detection masses in mam- 
mograms by analysis of local orientation patterns. Concentration of gra- 
dient and line orientation computed at a fine scale reveals the presence 
of masses and spiculation, respectively. In this paper a new computa- 
tional approach is presented which allows efficient computation of these 
features as a continuous function of spatial scale. It is shown that by us- 
ing these scale signatures estimates of mass size can be readily obtained. 
Experimentally it was found that mass size estimates can be used to 
improve mass detection, while full exploitation of the information repre- 
sented by the scale signatures is expected lead to further improvement. 
Results are presented for detection of malign masses in a database of 264 
mammograms representing 71 consecutive cancers found in screening. 



1 Introduction 

The success of breast cancer screening programs critically depends on the ability 
to detect non-palpable invasive cancers when they are still small, as tumor size is 
a very important prognostic factor Q. Invasive cancers are visible as masses. Ide- 
ally, these should be detected when they are smaller than 1.5 cm, because then 
they are detected early enough to have a strong impact on overall mortality re- 
duction. Masses smaller than 5 mm are rarely visible in mammograms. Detection 
of non-invasive intraductal in situ cancers, only visible by microcalcifications, is 
less effective as many of these do not get invasive during lifetime. 

Detection of small masses in screening mammograms is difficult, because they 
may be hard to distinguish from normal fibroglandular tissue patterns. Moreover, 
in a screening population only three to six out of thousand women have breast 
cancer. This very large fraction of normal cases makes screening a complex visual 
task for radiologists. To avoid perception errors, radiologist need to be alert at 
a constant high level. That failures are not uncommon has been revealed by a 
number of studies Recently, it was found in a large multi-center study that 

as much as 70 percent of the cancers detected in screening were are already visible 
on previous screening mammograms, where up to 20 percent was obvious enough 
to be classified as actionable by the majority of a panel of reviewing radiologists 
0. In another recent study, findings at previous screening mammograms of 544 
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cancers that were not detected in screening were reviewed In 25% of these 
cases the tumor was classified as overlooked or misinterpreted. An effective way 
to increase performance of radiologist in screening is double reading |3 ■ Increases 
of sensitivity with 5-15 percent have been reported by having two independent 
readers. However, implementation of double reading may be hard to organize 
because of cost and time limitations. As an alternative, it has been suggested 
that computer programs that identify suspicious regions in mammograms can 
be used as a second reader. This approach turned out to be successful in a few 
small studies isEnm, but its success in practice will depend on the level of 
performance of detection algorithms in terms of both sensitivity and specificity. 

Masses in mammograms can be described as more or less compact areas 
that appear brighter than the tissue in which they are embedded, due to a 
higher attenuation for X-rays. When the tissue surrounding a mass is fatty, 
the detection problem is relatively easy and tumors as small as 5 mm can be 
detected. However, when a mass is projected in dense fibroglandular tissue it may 
be very difficult to recognize. Even large masses may be completely obscured by 
dense tissue m- This is one of the reasons for taking two different views of 
each breast, as is common practice in most screening programs. Usually, oblique 
and cranio-caudal (CC) projections are recorded. The appearance of masses can 
be circumscribed, fuzzy, or spiculated. In the latter case there is a radiating 
pattern of spicules surrounding the central mass area. Differentiation of masses 
from normal glandular tissue structures may be so difficult that one has to 
rely on distortion or asymmetry of the normal mammographic pattern, while 
sometimes a comparison with previous mammograms provides an important cue. 
Especially stellate patterns of straight lines are suspect, or straight retractions 
of the glandular tissue boundary. Bilateral asymmetry may form an important 
clue when a mass like area only appears in one side. Furthermore, the location 
of a suspect area sometimes plays a role. For instance, in a fatty area behind 
the glandular tissue and close to the chest wall, the presence of a mass is very 
suspect if it does not have a corresponding sign in the contralateral breast. Some 
examples of malign masses are shown in Fig. H 



In the past decade different methods for detection of masses in mammograms 
have been suggested, some focusing on bilateral asymmetry [1 211 .‘III 4| . detection 
of spiculation HS|, or on contrast and texture differences nancsi. All these 
methods have some aspects in common. Usually, a first phase is executed in which 
local image features are calculated at each pixel or at a set of regularly spaced 
points across the segmented breast area. Using these features, pixels are grouped 
into regions by a segmentation scheme. In a second phase features are calculated 
for each candidate region and a classifier determines regions that are regarded as 
suspicious. Various methods differ in the way they address and emphasize each 
of the two phases. Some apply very simple procedures to form many candidate 
regions and rely heavily on region classification in order to remove an abundance 
of false positives. The approach that is taken here is to concentrate on designing 
features that can be computed directly from the pixel grid, e.g. without requiring 
a region boundary. A classifier computes the likelihood of each pixel to be part 
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Fig. 1. Examples of malignant lesions: a circumscribed mass (left), a spiculated mass 
(middle), and an architectural distortion (right) 



of a mass, and a simple threshold on this likelihood image is applied to segment 
regions marked as suspicious as the final output. 

Analysis of local line and gradient direction patterns forms the basis for 
computation of the local features that are used. A detailed description of this 
method is given in Sect. 2. The method is an extension of earlier work 1151191 . The 
size of the neighborhood in which orientations patterns are evaluated is one of 
the most important parameters in the computation of these features. Variation 
of this size can have a dramatic effect on the detection of individual cancers, 
although the influence of this parameter on the overall performance measured 
on a large database tends to be less. In the past, the output of a local contrast 
operator has been used to set the size of the neighborhood adaptively. In Sect. 2 
a new approach is presented, in which features are computed as a continuous 
function of the neighborhood size, only slightly increasing the computational 
load. The curves that represent the directional features as a function of the radius 
of the neighborhood reveal aspects of the neighborhood patterns that may be 
very useful for improving detection performance by removing false positives. 

In Sect. 3 it is shown that the maximum of a gradient orientation feature can 
be used to estimate the size of a lesion. This size is used as an additional feature 
to improve detection performance in a scheme where various local features are 
combined using a neural network classifier. In Sect. 5 results are shown that were 
obtained using on a series of 264 mammograms, representing consecutive cases of 
cancer detected in screening, excluding cases which only had microcalcifications. 

2 Methods 

2.1 Local Orientation Distributions 

It has been shown that features representing local orientation distributions are 
well suited for detection of masses in mammograms The fact that such 

features are very insensitive to changes in contrast is a major advantage when 
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processing large datasets of mammograms of various origin, because one has to 
deal with unknown non-linear variation of the greyscale. Orientation maps are 
computed using first and second order Gaussian derivatives. When there is a 
concentration of gradient orientation towards a certain point this indicates the 
presence of a mass. A concentration of line orientations computed from second 
order directional derivatives indicates the presence of spiculation or architectural 
distortion. These concentration features will be denoted by gl and 11, respec- 
tively for gradient and line orientations. In addition, features representing radial 
uniformity measure whether or not increase of pixels oriented to a center comes 
from the whole surrounding area or from a few directions only. These will be 
denoted by g2 and 12. 

Previously, features for orientation concentration were computed by counting 
the number of pixels pointing to a center, and were defined to measure deviations 
of this number from the expected value in a random orientation pattern. The 
assumption was made that a binomial distribution of this number with mean 
probability p of a pixel pointing to a center can be used for normalization. As 
the probability p of hitting the center varies with the distance, this normalization 
may not be best choice. A more general definition of the features is given below, 
which allows to deal with varying values of p properly. 

For computation of the features at a given pixel i a circular neighborhood is 
used. All pixels j located within a distance r^m < < Tmax from i are selected 

when the magnitude of the orientation operator exceeds a small threshold. This 
selected set of pixels is denoted by Si. The features are based on a statistic xj 
defined by 



( 1 — Pi, if pixel j oriented to center, 

= else 

with pj the probability that pixel j is oriented towards the center given a ran- 
dom pattern of orientations tp with a probability density /i(<p). In principle this 
density can be estimated from the image in an area around site i. However, in 
this work only a uniform density is used. Pixels that are oriented to the center 
are determined by evaluating 



I ‘fj I 



D 

2rij 



(2) 



with atj the direction of the line through i and j and D a constant determining 
the accuracy with which pixels should be directed to the center to be counted. 
A weighted sum Xi is computed by 



Ai = 



j&Si 



( 3 ) 



where the weight factors can be chosen as a function of the distance r^, for 
instance to give pixels closer to the center a larger weight. For a noise pattern, 
the variance of this sum can be estimated when it is assumed that all pixel 
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contributions are independent: 

var{Xi) = var(^^ 

ieSi 

= w]var{xj) 

jeSi 

j&Si 



(4) 



Normalizing the sum Xi by the square root of the variance the value of the 
concentration feature fi is defined by 



/i 






( 5 ) 



When no weight factors are used and the neighborhood Si is subdivided in K 
rings around i in which the probability pk can be considered constant, the sum 
Xi can be written as 



= ^ ^ Xj. (6) 

k jeSi^k 

These rings are circular and concentric when the probability density of the ori- 
entations f{p) is taken uniform. In each ring k the number of pixels hitting the 
center Nk^mt can be counted, allowing the sum to be rewritten as 

^ — Pk) + {Nk — Nk,hit){—Pk) 

k 

— ^ Nk^hit — NkPk 
k 

= Nmt - Np ( 7 ) 

with Nk and N the number of pixels in ring k and in total, respectively. This is 
identical to the definition used previously, but the normalization factor, which 
can be written as {N{p — p‘^))~2 ^ is slightly different than the one used before, 
{N{p-p^))~^. 

If weight factors are used that only depend on pj, the sum Xi can be written 
as 

Xi = ^ Wk [Nk^hit — NkPk] (8) 

k 

which shows that the expected value of fi remains zero. If the probability density 
f{p) is uniform, all choices of Wj that depend only on Xij fall in this category. 
Results shown in this paper were obtained without using weights. Thus far no 
clear advantage of using a non-uniform weight function could be demonstrated 
experiment ally. 
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Fig. 2. Detection of a spiculated mass using line (top) and gradient orientation (bot- 
tom) maps. The figures in the central column show the labels allocated to pixels based 
on their orientations, when the window is centered at the tumor. The right column 
shows the output of line and gradient concentration filters 11 and gl. Pixels are marked 
white when they are oriented towards the center, or grey when they are not. For the 
orientation filter, pixels are marked black when their line magnitude is negative: dark 
linear structures do not represent spicules and are excluded 



It is noted that the approximation that is made by assuming all pixels to 
have independent directions is clearly incorrect, even when pixels have indepen- 
dent random values. Orientations of neighboring pixels become correlated by the 
use of convolution kernels for estimation. This leads to underestimation of the 
variance, which becomes larger with larger kernels. However, it seems that this 
effect is similar for normal and abnormal areas. For the purpose of removing 
dependency of the size of the neighborhood and compensating unwanted effects 
at the breast edge boundary the method is effective. 

In Fig. □ an example is shown of a spiculated lesion. The line orientation 
feature 11 shows a peak at the center of a spiculated mass. This coincides with 
an increase of the gradient concentration feature gl, which is not very strong in 
this case because the mass is not very compact. By combination of the features at 
each pixel using a classifier and by segmentation of the result a highly suspicious 
region results. 

Features g2 and 12 that measure radial uniformity of the orientation patterns 
around site i are computed by subdividing the neighborhood Si in L directional 
bins, that is like a pie. The statistic Xi is computed now for each bin. When there 
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is only noise the expected value of in each bin is zero. In previous work, the 
number of bins was counted in which the number of pixels pointing to the center 
was larger than the median of a binomial distribution determined by Ni and 
p, with Ni and pi the number of pixels in bin I and pi the average probability 
of hitting the center. This definition had some problems, as the median of a 
binomial distribution is not exactly defined. With the approach described here, 
it is sufficient to compute the number of bins n+ in which the sum of is 
positive. The radial uniformity feature is defined by 

x/W4 

with Ki the number of sectors at i. The standard deviation of n+ for random 
noise Ki/A is used for normalization, which is important to avoid problems at 
the edge of the breast where not all sectors can be used. 

2.2 Computation of Features as a Function of Scale 

In multiscale methods one tries to match the scale of feature extraction to the 
scale of the abnormality in order to optimize detection performance. Generally, 
the value of features used for mass detection depend strongly on the size of 
the abnormality, which makes multiscale approaches attractive. However, most 
multiscale methods are computationally intensive, because features have to be 
computed repeatedly at a number of scales. Usually only a very limited num- 
ber of scales are chosen, which reduces accuracy. Multiscale methods that have 
been proposed for detection of masses in mammograms include wavelets EH2I, 
maximum entropy HE], and multi-resolution texture analysis m- Also line con- 
centration measured at a number of scales was used in previous work on detection 
of stellate lesions, where the maximum over the scales was used HHI 

In this section a method is described that allows very efficient computation 
of a class of local image features as a continuous function of scale, only slightly 
increasing the computational effort needed for computation at the largest scale 
considered. The non-linear features described in the previous subsection belong 
to this class. In the first step of the algorithm an ordered list is constructed in 
which each element represents a neighbor j within distance ry of the central 
location i. In this list, positional information of the neighbor that is needed for 
the computation is stored, here the Xj,yj offset, angle ipj and distance with 
respect to center. This list is constructed by visiting all pixels in any order, and 
by subsequently sorting its elements by distance to the center. In the second step 
the actual computation of the features takes place, at each pixel or at a given 
fraction of pixels using a sampling scheme. The ordered list of neighbors is used 
to collect the data from the neighborhood. The Xj, yj offsets in the list are used 
to address the pixel data and precomputed derivatives or orientations at the 
location of the neighbor. The orientation with respect to i is used to compute 
orientation related features. Because the neighbors are ordered with increasing 
distance to the center, computation of the features from the collected data can 
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Fig. 3. Construction of an ordered list neighborhood pixels. In the right figure the 
grey value of each pixel represents its placement in the list, obtained by sorting with 
respect to distance to center 



be carried out at given intervals, for instance each time the number of neighbors 
has increased by some fixed number. As the computational effort lies in collection 
of the data, this only slightly increases the computational load. We use intervals 
in which the number of neighbors increases quadratically. Thus, features are 
computed at regularly spaced distances from the center. An example is shown 
in Fig. 1, where line and gradient concentration are plotted as a function of the 
distance to the center. In a similar way, a contrast feature can be computed by 
collecting the sum of pixel values as a function of distance to the center, and 
by and subtracting the mean of the last interval from the mean of the previous 
intervals. 

The curves that represent features as a function of the distance to the center 
reveal aspects of the neighborhood patterns that can be very useful for differ- 
entiation of true and false positive detections. For instance, in Fig. 0the peak 
of the gradient orientation signature (?l(r) is reached at the edge of the mass, 
and coincides with the radius at which the spiculation feature ^l(r) reaches some 
kind of plateau. This observation fits with the model of a mass from which most 
spicules radiate from the contour, and therefore should raise more suspicion that 
two similar maximum values reached at radii that do not correlate. 



3 Applications 

By taking the maximum of the scale signature gl(r) representing gradient ori- 
entation concentration the size of a mass can be estimated. In this section the 
accuracy of such a measurement is determined, and the use of size estimates to 
improve detection performance is studied. 

Results are obtained on a database of 71 consecutive cancers detected in 
a bi-annual screening program in Nijmegen, in the period of 1993 to 1996. In 
total, this set consisted of 132 mammograms with a cancer. In ten cases only 
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Fig. 4. Scale signatures of line and gradient concentration features, computed in the 
center of the tumor 



oblique views had been taken at screening. All cases detected by screening in 
the selected period were included, with an exception to those that only showed 
microcalcifications. Thus, the set may be regarded as representative for cases 
with masses detected by screening mammography. All cancers were annotated 
with the help of an expert mammographer. The annotations were used as the 
gold standard for evaluation of estimated mass sizes and detection performance. 
In cases with spiculated masses or stellate lesions, only the central mass or area 
was annotated. The median diameter of the annotated lesions was 15.4 mm, 
and 72 percent of the lesions was smaller than 2 cm. This may seem somewhat 
large for cancers detected by screening, but one should realize that many small 
intraductal cancers with only microcalcifications where excluded. The images 
were digitized with a Lumisys-85 digitizer at 50 microns and 12 bits per pixel, 
and were averaged down to 200 micron/pixel prior to all further processing. 

3.1 Estimation of Tumor Size 

For each mammogram in the database the gradient orientation signatures gl(r), 
were computed at sites spaced regularly at distances of 1.6 mm apart, storing the 
radius Vi^max at which gl(r) has its maximum for each site i. Before taking the 
maxima, the signatures were smoothed. Two parameters needed to be adjusted 
for calculation of gl, the scale a at which gradient orientations are determined 
and the parameter D used to determine whether or not a pixel is oriented to the 
center. Results that are shown were obtained by using a = 0.2 mm and D = 4 
mm. The interval in which gl{r) was computed was r G [2,20] mm, and the 
maximum of gl(r) was searched for in the interval [6,20] mm. 

If a pixel is close to the true center of a mass, it is reasonable to use the 
maximum of the gl signature to estimate the size of the mass. In the detection 
application we have in mind, however, the true center is unknown. Moreover, 
it appeared that the gl signatures may change considerably when the central 
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location used in the measurement is somewhat changed, especially when small 
values of D are used. Therefore, to estimate size all maxima of the gl(r) curves 
measured in a small region were taken into account, rather than relying only on 
the curve measured at one place. A small circular region was chosen, denoted by 
C. Different methods to estimate size were evaluated: 

1. The radius corresponding with the highest maximum of gl in C 

2. The mean of the radii corresponding to the maxima of gl in C 

3. The minimum radius corresponding with a maximum of gl in C 

In the experiments the radius of C was 2 mm. 

For validation of the size estimation methods the annotations in the database 
were used. It should be noted that not all the annotations in the database corre- 
spond with masses that can be clearly identified. Also architectural distortions 
without a clear central mass are included. Using the annotations as ground truth, 
the effective radius for each mass is used as size measure. This radius is defined 
as 7T, with A the area of the annotation in mm? . The circular measurement 
region C was chosen at the center of mass of each annotation. 

Results are shown in Fig. 0 It appears that taking the mean of the radii 
in a small region yields the most accurate estimates. It is also shown that the 
tumor sizes are somewhat overestimated by the radius at which the maximum 
of gl{r) occurs. This bias can be easily corrected for by subtracting a constant. 
In some cases, however, it appears that the estimated tumor size is far too large. 
Some of these are cases where a central mass is less or hardly visible. It would be 
interesting, if a measure could be derived from the gl{r) signature that represents 
whether the shape of the curve is typical for a mass. Obviously, a curve with a 
clear maximum will more likely yield a good estimate of tumor size than a curve 
that is more flat. Such a measure was defined by 

53 = / \gl{r) -gl{r- Ar) \ (10) 

with R the estimated size of the mass. In Fig. 0it is shown that by using this 
measure a subset of mass cases can be obtained that have better size estimates, 
where a threshold on g3 was set as such that 60 percent of the cases was selected. 

It was found that size estimates based on gl signatures computed with a 
high value of cr, the gradient scale, were less accurate. Size estimates obtained 
by using a smaller value of D in the gl computation were comparable to those 
in Fig. El 

3.2 Mass Detection Performance 

Experiments to determine detection performance were carried out using the 
database described earlier in this section. By adding 132 bilateral normal mam- 
mograms a set of 264 mammograms was obtained. According to the major radi- 
ologic sign, mammograms were classified as masses (68), spiculated masses (44), 
architectural distortions (12) and asymmetries (8). Features that were used were 
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Fig. 5. Comparison of tumor size estimates with true sizes derived from annotations. 
The estimates are based on gl signatures in a region of 125 pixels around the center 
of mass of the annotation. The gl signatures were computed with D = 4mm and 
a = 0.2mm. Top row: size estimates obtained at the site with the highest value of 
g^max (left) and at the site with the smallest radius at the maximum of gl{r) (right). 
In the bottom row the plots show estimates computed as the mean of the radii in the 
region, for all cases (left) and for cases with g?> > T, which more likely correspond to 
a well dehned mass 



the line and gradient orientation features described in Sect. 2, each computed 
at the scale where the orientation concentration feature reached its maximum 
value. In addition, features representing bilateral asymmetry and estimated size 
were used, where size was computed as explained in the previous section, cen- 
tering a small region at each site to be classified. The asymmetry feature was 
computed by non-rigid registration and subtraction of the right and left breast, 
followed by a Gaussian smoothing to focus on large asymmetries only m- 

A neural network classifier was used to compute the likelihood of suspicious- 
ness of individual pixels was trained on a separate dataset. All 39 mammograms 
with malignancies in the public MIAS database were used for this purpose |?Hj . 
excluding those with only microcalcifications. It should be noted that these im- 
ages were digitized using a different digitizer. Features that are used are defined 
in such a way that this should not make a difference. 

FROG curves displaying results are shown in Fig. 0 Sensitivity is computed 
as the number of lesions hit divided by the total number of lesions. A hit was 
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Fig. 6. Using estimated size in combination with asymmetry and mass detection fea- 
tures (a). The overall performance is improved when adding features for detection of 
spiculation (b). The case based FROG curve is obtained by counting a true positive 
whenever a cancer is found in either the CC or Oblique view 



counted when the center of mass of a region marked by the detection program 
fell inside the annotation, otherwise a false positive was counted. 

The results in Fig. El show that adding estimated size to the mass detection 
method based on asymmetry and gradient orientation in single views improves 
detection performance. The performance improves further when features to de- 
tect spiculation are added. The sensitivity that is obtained using a cases based 
measure is over 90 percent at a false positive rate of 0.5 FP/image. Remarkably, 
at a rate of 1 false positive in 50 images still 50 percent of the cases are flagged. 



4 Conclusions 

An efficient method to compute features representing line and gradient orien- 
tation concentration as a function of spatial scale was developed. Using such 
scale signatures size estimates of mammographic masses can be obtained. These 
estimates can be useful when regions with masses need to be segmented. Also, 
the radii at which the feature maxima occur can be used to select the neighbor- 
hood size adaptively. In combination with other features, including asymmetry, 
adding estimated size as a feature led to improved detection results. 

On a consecutive sample of non-microcalciflcation cases from screening, most 
with oblique and CC views, a high case sensitivity was obtained. The use of the 
mass detection features gave a much larger improvement of the FROC curve on 
this database than on datasets biased towards spiculated masses used previously 
m, as could be expected. Interestingly, this also holds for very low FP/image 
values. At 0.02 FP/image a case sensitivity of 50 percent was obtained. Assuming 
an incidence of 5 cancers per 1000 women and 4 mammograms per case this 
corresponds with a recall rate of 8 percent, which is quite common in the US. It 
may be advantageous in a prompting system to present suspicious regions that 
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have such a high specificity in another way to the radiologists than regions only 
generated at lower specificity levels. 

The method described in this paper is applicable to other areas in medical 
image analysis, for instance to lung nodule detection in CT. The features used to 
represent local orientation patterns are general and can be computed efficiently 
in 3D datasets as well. 
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Abstract. Today, most stndies of cognitive processes nsing fnnctional 
MRI (fMRI) experiments adopt highly flexible stimnlation designs, where 
not only the activation amount but also the time course of the measured 
hemodynamic response is of interest. The measured signal only indirectly 
reflects the underlying neuronal activation, and is understood as being 
convolved with a hemodynamic modulation function. An approach to 
better allow inferences abont the nenronal activation is given by model- 
ing this convolntion process. In this stndy, we investigate this approach 
and discuss computational models for the hemodynamic response. An 
analysis of a recent fMRI experiment underlines the usefulness of this 
approach. 



1 Introduction 

Functional magnetic resonance imaging (fMRI) has become one of the major 
experimental methods for analyzing cognitive processes in humans. The most 
common fMRI technique employs the blood-oxygen-level-dependent (BOLD) 
contrast P, which is sensitive to changes of the relative local concentration 
of oxygenated hemoglobin (Hb02) vs. deoxy-hemoglobin and thus reflects an in- 
direct measure of the brain’s neuronal activation. This effect is small, and data 
are noisy: thus, analysis of fMRI data has mostly focused on the detection and 
statistical quantification of functional activation. 



Input Stimulus 




fMRI Time Series 



Fig. 1. Signals at various stages of the convolution model of fMRI time series 
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Understanding brain function requires information not only on the spatial 
localization of neural activity, but also on its temporal evolution. There is an in- 
creasing interest in the time course (i.e. the shape) of the hemodynamic response 
(HR) and its modulation with respect to different experimental conditions. The 
measured fMRI signal y{t) is understood as the result from a series of convo- 
lutions of the input stimulus function i{t) (|3, see Fig. 0. So the question was 
raised to what extent conclusions may be drawn about the neuronal activation 
n(t) from the HR shape g(t). For example, the hemodynamic modulation in- 
troduces time constants at least an order longer than the underlying functional 
activation; the time-to-maximum of a HR due to a transient stimulus is typically 
delayed by 5-8s and dispersed by 3-4s 0 . So the key to detecting changes in the 
neuronal activation is the adoption and deconvolution of the HR by a model 
function |^. 

A number of heuristic functions have been proposed to describe the hemo- 
dynamic response: the Poisson function |^, the Gamma function IHE], a linear 
combination of the Gamma function and its temporal derivatives [7|, and the 
Gaussian function 0 . The evolution of these approaches follows their modeling 
complexity; early approaches assumed constant pre-set values for the lag jS|, 
while current models determined HR parameters voxelwise in the time series 
EHHI, or even per stimulus period 0. HR parameters were shown to depend 
on the subject, the site and the stimulation conditions ism, which underlines 
the usefulness of this approach. However, some issues were raised. 

— With the Poisson or the Gamma functions, interesting shape characteristics 
like delay (time-to-maximum), rise and fall times are hard to obtain. 

— While the best fits to an HR are generally found with the Gaussian function, 
especially responses following short stimuli were asymmetric (shorter rise 
than fall times). 

— For a better understanding of the underlying neuronal processes, a deconvo- 
lution of the hemodynamic modulation to yield parameters of the neuronal 
activation directly is highly desirable. 

— None of these functions is based on a physiological model. Although models 
of the oxygen delivery at membranes have been proposed HH, details of the 
neurono- vascular coupling are still under discussion and have not yet led to 
a comprehensive physiological model of hemodynamic modulation. 

Aims of this study were: (1) to test the feasibility of introducing more complex 
model functions for the HR, (2) to separate parameters describing the hemo- 
dynamic modulation from parameters of neuronal activation, and (3) to find 
physiologically more plausible models for the hemodynamic modulation. 

Recently, we described and validated a non-linear regression context 0 to 
model the HR per stimulation period (trial) and region-of-interest (ROI), which 
is briefly reviewed in the next section, along with a discussion of the three model 
functions studied here: (1) the Gaussian function, (2) a convolved asymmetric 
Gaussian function, and (3) a convolved compartment model. To compare the 
usefulness of these approaches, we re-analyzed a fMRI study of working memory. 
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2 Description of the Estimation Model 

A theoretical discussion and validation of our estimation model is described else- 
where jOj- For an excellent discussion about non-linear regression procedures, see 
m Throughout this paper, we assume that the locus of a functional activation 
is known. This knowledge may arise from a previous determination by well- 
established signal detection procedures or defined as regions of neurofunctional 
interest. These regions are considered as stationary in space. Note that we focus 
on single trial experimental designs here: a single cognitive task is given, and the 
hemodynamic response to this stimulus recorded. 



2.1 Model Definition 

We consider a subset of the fMRI data collected spatially from a ROI of k 
voxels and temporally from a single experimental trial at I discrete timesteps 
and denote this n = k* Z-dimensional vector as y. Timesteps are referenced by a 
Z-dimensional vector t. We model the hemodynamic response as a deterministic 
function g(t,f3), where f3 denotes a p-dimensional vector of model parameters, 
and we require that g{t, (3) is differentiable at least once with respect to (3. Data 
y are composed of (/(•) and a stochastic part e: 

y = ff(t,/3) + e- (1) 

The stochastic part is independent of the signal and stationary with respect to 
time, and its elements are normally distributed with a nonsingular covariance 
matrix V : 



e-iV„(0,V), then y ^ iV„(g(/3), V). (2) 

This allows us to use preprocessed data where the processing has introduced (or 
enhanced) a correlation structure. A way to determine the covariance structure 
from experimental data is described later in this section. 

We will now propose the model functions g{-) investigated in this paper. The 
first two are heuristic but offer a parsimonous number of parameters. The third 
function is complex but tries to incorporate the properties of tissue compart- 
ments involved in the BOLD effect. 



Model 1: Gaussian Function The best compromise between goodness-of-fit 
and the number of model parameters is found with the Gaussian function 0: 

g{t,l3) = aexp(-(t- to)^/(2do)) + b- (3) 

We denote the 4 components of ,3 as a: gain (the “height” of the HR), do: disper- 
sion (proportional to the duration of the HR), to: lag (the time from stimulation 
onset to the HR peak), and b: baseline. Here, no distinction can be made between 
“neuronal” and “vascular” parameters. 
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Model 2: Convolved Asymmetric Gaussian Function A first approach to 
closer model the processes depicted in Fig. 1 is to define the HR function g{t) 
by a convolution of a neuronal stimulation function n(t) with a hemodynamic 
modulation function f{t): 



g{t,f3) = n{t) ® f{t) + 6, 



( 4 ) 



where 0 denotes the convolution operator and 6 is a baseline term. We simply 
assume a square- wave function for the neuronal stimulation n{t): 



n{t) = 



a if t >= to and t < to -I- ti, 
0 otherwise. 



( 5 ) 



A Gaussian function is introduced for hemodynamic modulation function /(•), 
here with different dispersions (do, di) for the rising and the falling edge: 



_ / exp(-tV(2dg)) if t < 0, 

“ \exp(-tV(2d?)) if t >= 0. 



( 6 ) 



In this model, (3 consists of 6 parameters (do: dispersion on the rising edge, di: 
dispersion on the falling edge, a: gain, to: neuronal response onset, t\: neuronal 
response duration, b: offset). Modeling of the convolution process allows us to 
address the meaning of a, to, and ti as “neuronal” parameters, resp. do, di as 
vascular parameters. 



Model 3: Convolved Compartment Model In model 3, the formulation of 
a stepwise defined Gaussian function for the hemodynamic modulation function 
/(•) is still heuristic. It is physiologically more plausible to model the hemody- 
namic modulation process by a compartment model. We define the HR model 
function g{t) as inland the neuronal stimulation n(t) as inland now focus on 
a new definition of /(t). 

For the BOLD contrast, as discussed in the introduction, it is viable to think 
of the oxygenated blood as an “endogenous tracer” of brain activation. The ki- 
netic of external tracers such as radioactive markers or pharmaceuticals have 
successfully been modeled by compartment models since 1920 H3|. This mod- 
eling context is rich and well understood (for introductions, see [12114) 1. Gom- 
partments correspond to a body subspaces (i.e. tissue, vasculature), in which 
the local concentration of a tracer (i.e. oxygenated blood) is modified by trans- 
port between compartments (i.e. by diffusion, flow) or active processes (i.e. by 
consumption). If we assume a linear imaging process, then the HR measured 
in fMRI is proportional to the Hb02 concentration, and a compartment model 
should allow us to draw conclusions about the temporal oxygen flow pattern. 
Such a model is depicted in Fig. 0 

Hb 02 flows from the arterial into the capillary compartment at a rate 70, 
as mediated by a consumption process in the tissue compartment. The oxygen 
exchange between capillaries and tissue is described by rates 71 and 72. Finally, 
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Fig. 2. Movement of oxygen in two vascular and a tissue compartments 



rate 73 denotes the Hb02 drainage into the venous compartment. Assuming 
constant rates, the kinetic equations for the compartment model in Fig. are: 

fi = -7o/i, 

/2 = 7o/i + 72/3 - (71 + 73)/2, (7) 

fs = 70/2 — 72/3- 

The solution to this linear system of differential equations can be written in a 
sum-of- exponentials model for the *th compartment (see 1121 . p. 379ff): 

f^ = kii exp(-7ot) + ki2 exp(Aot) + exp(Ait), (8) 

where the parameters A07 and are: 

Ao = - ^ (71 + 72 + 73 + \/ (71 + 72 + 73)^ - 47273) , 

Ai = - ^ (^71 + 72 + 73 - \/ (71 + 72 + 73)^ - 47273) , 

kij = [1,0,0], (9) 

, _ r 70(72 - 7o) 70(72 + Aq) 70(72 + Ai) 

(70 + Ao)( 7 o + Ai) ’ (70 + Ao)(Aq — Ai) ’ (70 + Ai)(Ai — Ao) 

, _ 7071(72-70) 7071(72 + Aq) 7071(72 + Ai) 

(70 + Ao)( 7 o + Ai) ’ (70 + Ao)(Aq — Ai) ’ (70 + Ai)(Ai — Aq)_ 

The parameter vector f3 for this model consists of 8 items (7^: 4 transfer rates, 
a: gain, to- neuronal response onset, ti: neuronal response duration, b: offset). 
We attribute a, to, and ti as “neuronal” parameters, resp. the transfer rates as 
vascular parameters. 

2.2 Stochastic Background Model 

It was shown [TTlTnj that the stochastic part in preprocessed fMRI data may 
approximately be described by an Ornstein-Uhlenbeck process HZ!: (1) it is 
stationary with respect to time, (2) its elements are normally distributed with 



70(72 - 7 o) 70(72 + Aq) 70(72 + Ai) 

(70 + Ao)( 7 o + Ai) ’ (70 + Ao)(Aq — Ai) ’ (70 + Ai)(Ai — Aq) 
7071(72 - 7 o) 7071(72 + Aq) 7071(72 + Ai) 

(70 + Ao)( 7 o + Ai) ’ (70 + Ao)(Aq — Ai) ’ (70 + Ai)(Ai — Aq) 
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a covariance matrix V (seeQ and (3) their correlation is described by an AR(1) 
model. We assume that the spatio-temporal covariance matrix V is separable in 
space and time: 



V = S©T, 



( 10 ) 



where © denotes the Kronecker product. The elements of the (spatial) covariance 
matrix S are given by the variance su = cr^ and the covariance Sij = cov{h) which 
depend on the distance h of the voxels i and j. Most easily, a semivariogram r]{h) 
fT?j| is used to determine the type of stationary dependence in the data: 



rj{h) = — cov{h) 



1 

2*n.h 



~ 



( 11 ) 



where N{h) is the set of voxel pairs at distance h, and nh is the number of 
pairs in the set. For an AR(1) process with positive correlations, an exponential 
function fits to the semivariogram: 



r]{h) = Q!o(l - exp(-aih)). 



( 12 ) 



where h is the distance between voxel sites. From the model parameters, we 
can derive the variance = ag and the autocorrelation p = exp(— oi). The 
covariance matrix S of a linear array of k voxels is defined as: 





/I 


P P^ ■■■ 








P 


Ip ■■■ 


pk-2 




S = cr^ 




pi ■■■ 


pk-3 


(13) 




\P^- 


-1 


1 J 





Similarly, a matrix T is formed for the temporal domain and composed as given 

in rm 



2.3 Estimation 

We find the ML estimate /3 of our model parameters as the vector /3 that mini- 
mizes the quantity: 

argmm{(y-5(t,/3))^V"i(y -5(t,/3))} . (14) 

In the case of the Gaussian function in model 1, this problem corresponds to a 
4-dimensional nonlinear minimization problem, which can easily be solved by the 
downhill simplex method of Nelder and Mead This method is not feasible 
with the more complex models 2 and 3, where the cost function (d is expected 
to possess multiple local minima. Because derivatives of the model functions are 
only available as finite difference approximations, derivative-free optimization 
methods are preferable. We investigated the use of (1) a combination of simu- 
lated annealing with the downhill simplex method m, (2) Shor’s minimization 
method EDI, and (3) an optimization using a genetic algorithm |'2 1 j . 
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2.4 Confidence Limits and Statistical Tests 



Using a first-order linear model, we can derive confidence limits for the estimation 
from the inverse of the Fisher information matrix F jZ2|: 

/3~iV(/3,F^i), where F^ = G^ViG^, (15) 



and G ^ denotes the Jacobian matrix of g{-) with respect to /3. 

A simple measure for the goodness-of-fit (GOF) is given by the y^-statistics: 

= e^V"^e, where e = y-g(t,/3) (16) 

A more complex measure is derived for the F-statistics, following Hartley 

m-- 



p = 



F 

^ p,n—p 



(n — p) e^Pe 
P e^(I«-P)e’ 



(17) 

(18) 



where n corresponds to the number of data points, p to the number of parame- 
ters, and \n is the n * n identity matrix. 



3 Experiments 

To study the usefulness of this modeling approach, we re-evaluated datasets 
acquired in a fMRI study of working memory m- 

Behavioral Experiment: Subjects learned three sets of letters (4, 6 or 8 char- 
acters) at least two days before the scanning session. A trial started with the 
display of a small red box (for 800ms), followed by the cue and, after a delay (0, 
2 or 4 seconds), the probe. Subjects had to indicate by a button press whether 
the probe item belonged to the cued set. 108 randomized trials were run using 
an intertrial interval of 18s. 

fMRI Parameters: During the behavioral experiment, 7 axial slices (64x64 
voxels, 3.8x3.8x5mm voxel size, 2mm gap) were recorded on a Bruker Medspec 
300 system using an EPI protocol with a repetition time of Is. All timings were 
corrected for the slice acquisition delay in the EPI protocol. 

Preprocessing: We randomly selected data obtained from 4 subjects. Data 
were preprocessed by (1) correction for in-plane movements and (2) corrected 
for baseline fluctuations, (3) lowpass filtered in the temporal domain to reduce 
the amount of system and physiological noise (see pn] for details). As a result 
of this preprocessing, only the fundamental frequency (corresponding to the 
stimulation) and its first harmonic were retained in the temporal domain of the 
data. 

Definition of ROIs: Standard procedures were applied to detect functional ac- 
tivation in the datasets: (1) analysis for activated regions by Pearson correlation 
with a time-shifted box-car waveform (A = 6s), (2) conversion of the correlation 
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coefficient into z-scores and thresholding of the corresponding z-map by a score 
of 10, (3) assessment of the activated regions for their significance on the basis 
of their spatial extent m- Now that we detected voxels with functional activa- 
tion, we defined ROIs by collecting the 6 most highly activated voxels around 
local maxima in the z-map. We obtained a total of 94 ROIs from 4 subjects. An 
illustrative map of ROIs is shown in Fig. 0 




Fig. 3. An illustrative z-map overlay onto the corresponding anatomical from the work- 
ing memory experiment. Neurofunctionally interesting ROIs are labeled: AIl'- superior 
anterior insula, MPCl- middle prefrontal cortex, IPGl- inferior precentral gyrus, 
CM A: cingulate motor area, MCl- motor cortex, PPCl- posterior parietal cortex, 
SCl'- sensory cortex 



Averaging: To reduce the number of estimations, we averaged voxels within 
a ROI at a given timestep and across trials with the same delay time manip- 
ulation (0, 2, and 4 s). So per ROI, we obtained three different timecourses of 
18 timepoints each. As a consequence of averaging in space, we simplified our 
estimation model by setting S = I^, where k = 6. 

Tests We adapted the 3 HR models defined in the preceeding section to 
the 3 averaged timecourses in the 94 ROIs. To achieve realistic solutions, we 
constrained the solution space by the following intervals: 

— model 1: gain: 0 <= a < 5000, dispersion: 0 <= do < lOi lag: 0 <= < lOi 

and baseline: —500 < b <= 0, 

— model 2: onset and duration: 0 <= ti < 10, dispersions, gain and offset as 
above. 

— model 3: diffusion constants: 0 <= ji < 10, onset, duration gain and baseline 
as above. 

For model 1, the downhill simplex algorithm was applied, with computation 
times of less than a second per estimation. For models 2 and 3, we achieved the 
best GOFs using the genetic algorithm. Parameters of the genetic optimization 
process were: 1000 generations, 500 population members, p(exchange) = 0.2, 
p(mutation) = 0.01, p(crossover) = 0.2. 
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4 Results 



To give an impression of typical waveforms and modeling results, we selected 
a signal from the left motor cortex MCl (see Fig. OJ in one of the subjects. 
Averaged HRs of different experimental delay times are shown in Fig. 0. 

HR from Region MC^ 




Fig. 4. HRs from region MCl for the 3 different delay times 



For longer delays, the HR in this region was higher, the time-to-maximum and 
duration were longer. This was reflected in the parameters of the 3 HR models 
(see Table nj. For all models, the increasing height of the HR with delay time was 
found as an increase of the gain a. For model 1, the shift of the time-to-maximum 
led to an increase of to, the increasing width to an increase in the dispersion 
do- For model 2, the parameters attributed to the hemodynamic modulation 
(do and di) were relatively independent of the delay time manipulation. Shift 
and delay were reflected in increasing values of tg and ti. Finally, with the 
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convolved compartment model, only an increase of ti with the delay was found, 
rate constants were similar. 



Table 1. HR parameters for the signals m Fig. Elfor the 3 different experimental delays 



Model 1: Gaussian Function 



Delay [s] 


a 


to [s] 


do [s] 








0 


324 


3.95 


2.78 


2381 






2 


429 


4.34 


3.21 


1270 






4 


496 


5.05 


3.34 


1994 






Model 2: 


Convolved Asymmetric Gaussian Function 


Delay [s] 


a 


to [s] 


ti [s] 


do [s] 


di [s] 




0 


2324 


2.58 


3.31 


2.00 


3.30 


724 


2 


3420 


3.45 


3.55 


2.94 


3.07 


1193 


4 


4062 


3.36 


5.05 


2.78 


3.04 


1625 



Model 3: Convolved Compartment Model 



Delay [s] 


a 


to [s] 


ti [s] 


7o 


7i 


72 


73 


X^ 


0 


1847 


0.06 


5.63 


0.66 


6.11 


4.60 


1.27 


770 


2 


2439 


0.03 


6.47 


0.39 


6.50 


6.10 


1.54 


1934 


4 


3056 


0.06 


7.24 


0.60 


6.17 


6.56 


1.52 


2161 



By inspection of the waveforms, we typically found that ROIs in all subjects 
showed an increase of the time-to-maximum and the gain with increasing delay 
time, similar to the example HR in Fig. A closer examination of the estimated 
modeling parameters by a cluster analysis revealed differences, which allowed us 
to group ROIs into 4 categories (see Table EJ : 

— Group 1 : early rise, little dependence on the delay time manipulation. ROIs 
of this category were found in cortical areas, which are relevant for encoding 
the stimulus. Examples include the posterior parietal cortex PPCl- 

— Group 2: early rise, delay dependence: ROIs of this category are relevant for 
maintaining the stimulus. Examples include the anterior insula AI^. 

— Group 3: late rise, delay dependence: ROIs take part in the decision process 
following the delay and for generating the motor response. An ROI in the 
primary motor cortex {MCl) belongs to this group. 

— Group 4 ■ late rise, little delay dependence. An example for this group is given 
with the sensory cortex (SCl)- subjects left their finger on the response 
button independent on the delay time. 

In accordance with the observation of general delay dependence, most ROIs 
either belong to group 2 or group 3. It is interesting to note that most early 
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Table 2. Onset (to), duration (ti) and end time {te = to+ti) of the neuronal activation 
as estimated by model 2. Examples for 4 groups defined by their different temporal 
behaviour are shown. ROI labels correspond to Fig.El all values are given in [s] 



Group 1: PPCl Group 2: AIl Group 3: MCl Group 4: SCl 



Delay 


io 


ti 


te 


to 


ti 


te 


to 


ti 


te 


to 


ti 


te 


0 


2.90 


1.74 


4.64 


3.27 


0.71 


3.98 


4.02 


1.46 


5.48 


4.04 


4.26 


8.30 


2 


2.78 


2.54 


5.32 


2.86 


1.85 


4.91 


3.71 


4.68 


8.39 


4.80 


4.34 


9.14 


4 


3.90 


1.64 


5.54 


2.92 


3.61 


6.53 


4.92 


4.89 


9.81 


4.98 


4.93 


9.91 



activated areas in group 2 exhibited their delay time dependence in the duration 
time ti, while late responses in group 3 showed a delay time dependence in the 
end time This finding may be interpreted as a pre-activation of group 3 areas 
during the delay phase: i.e. the motor cortex is “held active” until the response 
decision following the delay period. 

Experiences with the 3 models were summarized as: 

— Model 1: For the Gaussian model, 3 parameters describe the shape of the 
response: the lag Iq, the dispersion do, and the gain a. It was shown Pj that 
these parameters are interpretable in terms of the experimental stimulation. 
However, there is no distinction between parameters describing the hemody- 
namic modulation and neuronal activation in this model. Thus, no decision 
is possible whether a wide HR is due to a longer activation (i.e. a neuronal 
effect) or a longer dispersion (i.e. a hemodynamic effect). However, good 
convergence properties allowed us to use a rather simple and very efficient 
optimization scheme. 

— Model 2: Fits are better in comparison with model 1, often down to ~ 50, 
which was a consequence of modeling the HR asymmetry by two different 
dispersion parameters. As it was suspected previously j0|, HRs which arise 
early and follow short stimuli were found asymmetric with a shorter ris- 
ing edge do (typically 2-3s) than falling edge di (typically 3-4s). Late and 
wide HRs tend to be symmetric with dispersions in the order of 3-4s. The 
attributed neuronal activation parameters, onset to, and duration ti, are 
interpretable in the context of the fMRI experiment. A genetic algorithm 
was necessary to optimize this model, so there is a marked increase in the 
computation time (12min per estimation) in comparison with the previous 
model. 

— Model 3: We adapted HRs both to model equations for the capillary com- 
partment 2 and the tissue compartment 3. Fits for both compartments are 
comparable with model 1, with slightly better values for compartment 2. 
This is in agreement with the mechanism of the BOLD effect; the fMRI sig- 
nal arises from the vascular compartment. Rates 70 (inflow) and 73 (outflow) 
(see Fig. EJ were found between 0.3-0. 7, rates 71 (vessel to tissue) and 72 
(tissue to vessel) in the order of 6-9. This is interpretable as an easy transfer 
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(of oxygen) between the vascular and tissue compartment, while the slug- 
gish (active) effect of vascular dilatation and constriction was modeled by 
low inflow and outffow rates. Onset times to were always below Is, and the 
duration in the order of 4-7s. With the current formulation and optimization 
approach, this model needs the rather high amount of computation time of 
32min per waveform. 

5 Discussion 

The description of the HR by a model function is considered as an advantage 
because it provides a compact and concise parameterization of the HR shape. 
Current choices for model function are arbitrary, since none is based on a com- 
prehensive physiological model. Our preference for the Gaussian function in pre- 
vious studies was only justified by the fact that we observed the best fits in a 
non-linear estimation procedure. 

In this study we tested the feasibility of separating “neuronal” from “vascu- 
lar” parameters by introducing complex HR model functions. Separating hemo- 
dynamic from neuronal factors is highly desirable in cognitive research, not only 
to better characterize the neuronal mechanisms of a cognitive task, but also to 
better understand the reasons for interindividual differences in terms of “good” 
and “bad” responders in fMRI experiments. 

From experiences with model 2 we confirmed that asymmetries are present in 
HRs. By including parameters to adapt to asymmetries, marked improvements 
in the fits were achieved, especially with brief stimuli and early responses. The 
introduction of the convolution operation in the modeling context allowed us to 
separate parameters. However, no experimental justification yet exists for the 
designation as neuronal or vascular properties other than the conformance of 
results with the current understanding of cognitive processes involved in the 
example fMRI study. However, it is rather easy to design fMRI experiments 
better targeted towards a justification of this hypothesis. 

The non-linear regression model from ([5 and m allows the use of complex, 
highly non-linear functions in our problem domain. We had to resort to a costly 
optimization method (the genetic algorithm) and to averaged waveforms instead 
of using single trial data directly. From this feasibility study we learned that it is 
possible to derive rather narrow limits for hemodynamic modulation parameters. 

An interesting reformulation of the compartment model follows from the 
observation that oxygen delivery to the tissue compartment obeys a Hill-type 
equation |lll27j . i.e. transfer rates ■ji from the vascular to the tissue compartment 
are non-linear functions of the oxygen tension. At least in healthy subjects, this 
functional dependency is well described and thus may be introduced in a more 
complex formulation of the compartment model in |3 Since usually only a few 
timesteps per trial are recorded, there is an upper bound for the parameter 
number for any model function. 

We regard HR modeling as a new tool in fMRI data analysis which will 
lead to a deeper understanding of the mechanisms underlying the physiological 
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and neuronal basis of brain functioning. Models as proposed in this paper open 
another approach for investigating the dynamical properties of the brain. 
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Abstract. This paper describes largely automated methods of creat- 
ing connected, 3D vascular trees from individual vessels segmented from 
magnetic resonance angiograms. Vessel segmentation is initiated by user- 
supplied seed points, with automatic calculation of vessel skeletons as 
image intensity ridges and automatic estimation of vessel widths via 
medialness calculations. The tree-creation process employs a variant of 
the minimum spanning tree algorithm and evaluates image intensities 
at each proposed connection point. We evaluate the accuracy of nodal 
connections by registering a 3D vascular tree with 4 digital subtraction 
angiograms (DSAs) obtained from the same patient, and by asking two 
neuroradiologists to evaluate each nodal connection on each DSA view. 
No connection was judged incorrect. The approach permits new, clini- 
cally useful visualizations of the intracerebral vasculature. 



1 Introduction 

Neurosurgeons and interventional radiologists must often occlude blood vessels 
during vascular procedures. The risk of stroke to the patient depends largely 
upon the collateral flow provided by other parts of the circulation. It is therefore 
important for the clinician to visualize vascular connections in order to make 
correct decisions about vessel occlusion. 

Three types of medical images provide vascular information. The first is by 3D 
data acquisition, as computed tomographic or magnetic resonance angiography 
(CTA or MRA). These studies do not explicitly define vascular connections. 
The second method is digital subtraction angiography (DSA), which produces 
localized projection images of the circulation in a form that is usually difficult to 
interpret in 3D. Indeed, neither of these imaging methods provides the clinician 
direct, 3D information about vascular connections. 

The third method of vascular visualization, currently under development by 
several commercial companies, is 3D reconstruction of a series of DSA images 
obtained in an arc Hg. Each contrast injection opacifies a vascular subtree that 
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is visualized from multiple points of view. The subtree is then reconstructed into 
3D using an approach similar to that used to create CT datasets. This approach 
still does not provide the necessary connectivity information, however. Although, 
in theory, one could create a connected vascular map of the entire circulation 
by performing hundreds of contrast injections, reconstructing each sequence into 
3D, and then concatenating the results with a knowledge-based approach, the 
time required and the toxicity of the contrast agent preclude such methods. 
The fundamental problem is that, like MRA and CTA, 3D-DSA provides the 
clinician only visualizations based upon image intensity values. 

We share this interest in 3D DSA reconstruction |3l4ffiltij . However, we be- 
lieve that the utility of 3D-DSA is severely limited without an associated directed 
graph description of the vasculature. A symbolic representation of vascular con- 
nectivity is necessary in order to estimate collateral flow to a region, to simulate 
catheter motion through a 3D vascular tree, or to permit ready identification on 
projection images of safe occlusion points. It is exactly this kind of information 
that the clinician needs to know. No current imaging modality (MRA, DSA, or 
3D-DSA) can provide this kind of information directly. 

This paper describes methods of producing directed graphs of the intracere- 
bral vasculature from segmented MRA data. The same approach could also be 
applied to 3D-DSA images. We intend these graphs for use under conditions 
in which errors may produce patient injury. Four requirements must be met in 
order for these graphs to be clinically useful. First, the segmentation must be 
accurate and complete in the region of interest. Second, the parent-child associ- 
ations must be correct. Third, the accuracy of the construct must be evaluable. 
Finally, editing and display tools are required. 

This paper discusses issues 2-4 above. Our MRA segmentation method is 
described elsewhere P| and is only outlined here. The current report focuses 
upon automated methods of producing directed graphs from segmented vessels. 
We evaluate the accuracy of nodal connections in a final tree by comparison of 
projections of our construct to the “gold standard” of DSA. Finally, we describe 
editing and display tools. Our aim is to provide symbolic vascular descriptions 
that can be used effectively under clinical conditions of high risk. 



2 Issues in Vascular Model Creation and Testing 

This study employs both MRA and DSA images. This section outlines four im- 
portant facts about the intracerebral circulation and the two imaging modalities 
employed. The term “vessel” refers to an unbranched, 3D, vascular segment. 

First, an MRA contains vessels not seen by DSA. Human beings usually 
have 3 intracerebral arterial circulations arising from different parent vessels. 
An MRA provides a 3D image that shows all 3 circulations simultaneously. A 
DSA, however, provides 2D projections of child vessels opacified following focal 
injection of contrast. An angiogram therefore depicts flow only within a single 
subtree. Since an MRA visualizes all 3 circulations, it contains information a 
DSA does not. 




310 



E. Bullitt et al. 



Second, a DSA contains projections of vessels not seen by MRA. A DSA fills 
vessels of many widths. An MRA tends to contain only the larger vessels. DSA 
therefore provides more vascular detail than MRA. Many MRA studies also do 
not include the full head. A DSA therefore contains information an MRA does 
not. 

Third, both DSA and MRA contain distortion errors that may interfere with 
MRA-DSA registration. DSAs contain geometrical distortions such as pincush- 
ion effects. We (and others) can largely correct such errors. The distortions pro- 
duced by MR are more difficult. Although machine-specific and patient -specific 
flow errors can be reduced or eliminated [I3ISI, it is more difficult to correct 
distortions at a tissue-air interface. Such errors are reported to displace objects 
by as much as a centimeter |7in] and tend to occur at the skull base and brain 
surface. 

Finally, the human intracerebral circulation is complex, plethoric, and vari- 
able. Multiple vessels exist in the same region of space. The track of an individual 
vessel can contain tight loops. Even the 3 major circulatory groups are connected 
differently in different patients via the Circle of Willis at the base of the skull. 
It is thus impossible to create a single model applicable to all patients. Fig. E 
illustrates the circulation’s complexity, the differences between DSA and MRA, 
and an example of MRA segmentation. The segmentation method is outlined 
later. 




(a) (b) (c) 



Fig. 1. DSA and MRA. (a) Left internal carotid DSA from the front. DSAs show only 
a portion of the circulation. No vessels fill on the right side or back of the head; these 
areas are supplied by different parent vessels, (b) Volume rendered MRA from below. 
Vessels occupy the entire head but detail is missing, (c) Vessels segmented from the 
MRA shown in B and projected from the same point of view. An aneurysm is at image 
center 



This study creates a directed graph from segmented MRA vessels and tests 
the accuracy of nodal connections by projecting each parent-child connection 
against a sequence of DSA images obtained from the same patient. Evaluation is 
only possible for the set of vessels the two imaging modalities hold in common. 
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Registration is imperfect at the skull base and for some peripheral vessels close 
to the brain surface. 

3 Methods 

3.1 Image Acquisition and Image Distortion Correction 

The vessel segmentation method does not require a specific image acquisition 
protocol, and is applicable to both CT and MR data. For this study, 3D, time 
of flight MRA was performed in a Siemens 1.5 T Vision unit with a quadrature 
head coil and with magnetization transfer suppression. Images were acquired in 
the axial projection over a 7.6 cm volume, using 69 contiguous 1.1 mm sections 
and an x and y spacing of 0.85 mm. Machine-specific distortion errors were 
evaluated by imaging a Siemens multi-purpose phantom. The major problem 
found was a 10% error in interslice spacing, which our software now corrects. 

We use DSAs to evaluate the connections made by the tree creation program. 
The case analyzed in this report employed four 459 x 484 pixel DSA images 
(AP, lateral, LAO, and modified LAO) obtained from a portable Diasonics OEC 
digital angiographic unit. The field of view was variously 8 or 12 degrees. This 
report also includes a picture of a vessel tree created from a different patient’s 
MRA and registered with a high resolution (1024 x 1024 pixel) lateral DSA 
obtained using a Siemens Multistar digital angiographic unit. 

Major distortions in the 2D images were corrected by imaging a finely milled 
crosshair phantom grid placed on the image plane. Phantom images were ob- 
tained using a variety of fluoroscopic positions. Each DSA was then corrected 
for distortion via a landmark-based system and interpolation by triangles to 
adjust the spatial location of each pixel. The greyscale value at each (x,y) point 
in the corrected image was then determined by interpolation. For the highly dis- 
torted OEC images, the image size after correction was 476 x 476 pixels (pixel 
size 0.3 mm). 

3.2 MRA Segmentation 

The MRA segmentation method makes use of the geometry of blood vessels. As 
outlined below, extraction of a vessel involves 3 steps: definition of a seed point, 
automatic extraction of an image intensity ridge representing the vessel’s central 
skeleton, and automatic determination of width at each skeleton point. Further 
details are provided by Ay 1 ward ^j. 

Extraction of each vessel begins from a user-supplied seed point. The user 
views a set of MRA slices and clicks on a point within a vessel, simultaneously 
supplying a rough estimate of that vessel’s width. The method then automati- 
cally extracts the central skeleton of the indicated vessel beginning from the seed 
point. Vessels can be viewed as 3D tubular objects delineated from background 
by contrast differences. This combination of geometry and intensity means that 
blurring the image creates a central intensity ridge along each vessel. This in- 
tensity ridge is extracted via the height ridge definition: 
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Define: I as the intensity at x, 

H as the Hessian of I at x, 

Vi and tti the eigenvectors and associated eigenvalues of H 
where Oi <02 < 03 . 

Then for the program to classify x as being on a ridge, it must be true that: 



02 / \J ai + a\ + —0.5 , (1) 

• V/ ~= 0 and U2-VJ^=0 . (2) 

Equation 1 states that most of the local curvature should be captured by 
the two principal components of the Hessian (hence 0.5) and that the curvature 
should be negative, corresponding to a ridge rather than to a valley. 

The width of the vessel is automatically estimated at points along its central 
skeleton. The method takes advantage of the fact that vessels have nearly circular 
cross-sections. The width of a tube about a central skeleton point is proportional 
to the scale that produces a maximal response from a cylindrical medialness 
measure. Define M{x, s) as the response from convolving the image at x with an 
extruded Laplacian of a Gaussian kernel aligned with the central skeleton and 
at a scale s. Then the radius r of the vessel at x is: 



r ~= 0.5 * arg-maXj.{M(a:, s)} . 



( 3 ) 



3.3 Characteristics of Segmented Vessels 

The output of the segmentation program is a set of unbranched, directed, 3D 
skeleton curves with an associated width at each point. Important characteristics 
of the segmentation include the following. 

1) The segmentation is largely complete when compared to volume rendered 
images of the initial MRA |2j. Figure [Ogives an example. 

2) As MRA datasets are noisy, the segmentation may include spurious ves- 
sels. One of the requirements for providing an accurate graph description is 
elimination of spurious curves. 

3) Segmented curves representing true vessels are often long and extend past 
multiple branchpoints. Figure EK shows the projection of the skeleton of a single 
segmented vessel. 

4) . The gap between the extracted vessel skeletons of a true parent -child pair 
is usually very short and in the order of a millimeter (Fig. EP). 

5) During segmentation, no new vessel is allowed to occupy territory previ- 
ously defined by another. Segmentation therefore tends to stop at “Y” branch- 
points. 
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(a) (b) (c) 



Fig. 2. Characteristics of extracted vessels, (a) An individual vessel skeleton (white) 
projected against a DSA. The white ball at the tip indicates flow direction. The curve 
is long, (b) The same vessel shown in (a) and its parent vessel. The parent begins as 
the carotid artery, extends into the middle cerebral trunk, and follows a path into a 
small middle cerebral branch (arrow), (c) Magnification of the parent-child connection 
region. The 1 mm distance between the segmented vessel skeletons (arrow) is so short 
that it cannot be seen on this projection 



6) Each extracted skeleton curve consists of an ordered series of 3D points 
and thus has a direction. However, this direction is determined during extraction 
and may not correctly model the direction of blood flow. Figured illustrates the 
three types of “Y” connections produced. For two of these three cases, the tree 
creation protocol must modify the child’s flow direction during connection with 
a parent vessel. 




Fig. 3. “Y” connection of segmented vessels: flow redirection. Arrows indicate the 

direction of flow in segmented vessels prior to connection. Child vessel V2 is about to 
connect to parent vessel VI. (a) The direction of flow in V2 is correct. No adjustment 
is needed, (b) Flow direction in V2 is incorrect and must be reversed, (c) V2 must be 
broken and the flow reversed in the left half of the vessel 
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3.4 Tree Creation 

The protocol for connecting segmented vessels uses both linear distance measure- 
ments and the 3D image intensity data in suspected regions of connection. Both 
“I” connections (connection of 2 endpoints) and “Y” connections (connection of 
an endpoint with an intermediate point on a segmented vessel) are permitted; 
“X” connections are not. All allowed connections therefore involve at least one 
endpoint of an extracted vessel’s skeleton curve. 

A version of the minimum spanning tree algorithm is employed. The user 
views a projection of the segmented vessels and interactively selects one or more 
roots. A maximum 3D-connection distance is set (default 2 mm) as is a max- 
imum intensity ratio (default 0.75), described below. The program then auto- 
matically builds a set of trees by progressively attaching a child to the connected 
base until no orphan remains whose connection meets both distance and image 
intensity requirements. 

On each iteration, the orphan is selected whose connection provides the min- 
imum “connection value”. Figure 0 shows the 3 allowed types of parent-child 
connection. For each orphan-parent, the program estimates the 3 pairs of possi- 
ble connection points and then inspects the image data. A line is drawn between 
each point pair in the 3D image. A hollow, concentric cylinder of radius larger 
than that of the child is then constructed along this axis. The average image 
intensities of the cylinder and line are expressed as a ratio. A low ratio (high 
central intensity and low peripheral intensity) suggests a valid connection. The 
“connection value” is a weighted sum of this ratio (4 x ratio) with the linear dis- 
tance between connection points. The orphan with the lowest connection value 
is added next. As noted earlier, the flow direction in segmented data may not 
be correct. When appropriate, the protocol therefore reverses flow in the child 
(Fig. Ep) or splits the child and reverses flow in one segment (Fig. EP)- 

As shown later, noise in the MRA results in the extraction of multiple spuri- 
ous curves that are processed along with curves representing true vessels. These 
spurious vessels are almost entirely eliminated during tree creation as we start 
from a given root, examine only connections involving an endpoint, and allow 
only connections involving short distances and fitting the image intensity data. 



3.5 Tree Editing and Tree Based Display 

The 3 cerebral circulations are variably connected at the skull base through the 
Circle of Willis. Our methods cannot automatically detect the direction of flow in 
these connections. The program therefore provides editing tools and the ability 
to load a DSA as a background bitmap to help the user separate or connect 
major trees of interest. 

During tree creation, each parent orders and marks the position of each child 
as the child is added, and each child marks its parent. It is therefore relatively 
simple to provide a set of tree-based editing tools. More specifically, the user may 
click on a projection point to a) delete proximal and/or distal vessel segments 
and associated subtrees, b) delete a vessel with associated subtrees, c) disconnect 
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a subtree from a parent and reconnect to a user-selected parent or to a specific 
parent point, and d) reverse flow in a vessel with automatic recalculation of child 
order and position. All changes are implemented in 3D. 

Similarly, a variety of display options are available. One may click on a vessel 
projection to view that vessel’s subtree, the vessel and its parent in isolation, 
or the set of connections from that vessel to the root. One may even simulate 
passage of a catheter by progressively clicking along a desired path; only the 
distal vessel and its appropriate descendents will be shown. 

The case analyzed in this study is that of a left carotid circulation that fills 
the anterior cerebral and left middle cerebral groups. Following segmentation of 
the left half of the supratentorial portion of an MRA, a single vascular tree was 
created with the left carotid as root. The editing tool was then used to delete 
the distal posterior communicator and right A1 segments (with simultaneous 
automatic deletion of all descendents of these vessels) to produce a single vascular 
tree that contained only the vascular groups shown by DSA. 

3.6 Evaluation by DSA Images 

Each 3D vascular tree may contain dozens of connections. Connection accuracy 
can be evaluated by the gold standard of angiography. We superimpose a pro- 
jection of each 3D tree upon a series of DSA images obtained from a variety of 
angles. Each nodal connection can then be examined individually in light of the 
information provided by DSA. 

Registration of segmented vessels with DSA is done as described by Liu El. 
The 3D/2D registration process uses as primitives 4-8 2D curves extracted from 
the DSA and an equivalent number of 3D curves extracted from the MRA. 
The program then optimizes a viewplane based disparity measure based on the 
iterative closest point paradigm between the DSA skeletons and the projections 
of the MRA skeletons. Newton’s method on the pose parameters in 3D is used 
to refine the solution iteratively. 

Four different DSA views of the left carotid circulation were available for the 
case analyzed in this paper. All four views, together with registered projections 
of our 3D vascular tree, were given to two neuroradiologists for evaluation. A 
variety of viewing options were available, including stepwise progression through 
the set of connections such that only one parent, child, and connection were 
projected (and color coded) at one time. 

Each radiologist filled out a form in which each connection was judged as: 
1) correct, 2) partially correct (a minor error of no clinical consequence), or 3) 
incorrect. A fourth category “?” indicated a miscellaneous problem, such as an 
extraneous vessel or indeterminable parentage. 

For a vessel connection evaluable under the first 3 categories, the global 
rating for that connection was taken as the worst rating given. For example, a 
connection that was judged as incorrect on even one view was judged as globally 
incorrect. For vessels and vessel connections falling into the fourth category 
(“?”), analysis was performed in two ways: as if the vessels had been removed 
from analysis and as if the vessels had been judged incorrect. 
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Statistical analysis for this test case was summarized using confidence in- 
tervals for both the proportion completely correct and the proportion clinically 
acceptable (completely or partially correct). Confidence intervals were computed 
using StatXact, as described in Johnson et.al. unm. As the confidence inter- 
vals were not independent, a Bonferroni correction was utilized so as to present 
a 97.5 confidence interval for each proportion. 

The confidence intervals were calculated in the following manner. Suppose 
that p represents the true probability of success (correct or clinically acceptable, 
depending on the case) . It follows that the distribution of the number of successes 
in N trials may be described as: 



After observing the number of successes (t) in the N trials, 97.5% confidence 
intervals were computed by finding {pl,Pu)i where: 

Pr {T>t\pL) = 0.0125 and Pr {T <t\pu) = 0.0125 . (5) 

If f = 0 it follows that p^ = 0. Likewise, if t = it follows that pu = 1. 

4 Results 

Figure 0 illustrates a DSA with a superimposed projection of the constructed left 
carotid tree during the stages of tree creation. Figure shows the skeletons 
of all extracted vessels prior to processing. There is an enormous amount of 
noise. Following automated tree creation (Fig 4B) the noise is almost entirely 
eliminated. Figure shows the result after exclusion of connections to the 
posterior and right carotid circulations via point and click operations. 

Two modifications to the final tree were made before giving it to the neu- 
roradiologists for evaluation. A small branch that probably represented noise 
was deleted. More significantly, flow within the vessel shown in Fig. EK was 
reversed. This extracted vessel terminated 0.8 mm from a peripheral middle 
cerebral branch and originated 1.1 mm from its proper parent. The distances in- 
volved were too short to make good use of intensity evaluations at the proposed 
connection point. The resultant connection error was fixed by point and click 
editing. 

The final tree comprised 25 vessels out of the initial 140 used as input data. 
One of these vessels was a root with no parent. Twenty-four vessel connections 
were therefore available for evaluation. 

On formal evaluation, not one of the nodal connections was judged as incor- 
rect by either neuroradiologist. However, both radiologists judged 22 connections 
as fully correct and questioned or faulted two others. Reviewer A felt there was 
insufficient image data to adequately evaluate two cases. Reviewer B judged 
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two connections as “partially correct” (containing a minor error of no clinical 
significance). Both reviewers agreed that one connection was questionable; they 
disagreed about the second connection in question with each reviewer accepting 
as fully correct the connection that the other reviewer queried. 




(a) (b) (c) 



Fig. 4. Skeletons of projected 3D vessels are white, (a) Extracted vessels projected 
upon a DSA. There is much noise, (b) Projected vessels following tree creation with 
left carotid as root. Almost all noise is eliminated, but connections remain to the 
right carotid and posterior circulations. The arrow points to the right carotid artery, a 
legitimate vessel connected to the left carotid circulation, but one that does not receive 
flow from this patients left carotid, (c) Final tree after point and click deletions to clear 
the right and posterior circulations 



Statistical evaluation was performed on each reviewer’s response to deter- 
mine the 97.5% confidence intervals for both the proportion entirely correct and 
for the proportion clinically acceptable. For reviewer B, who marked two con- 
nections as partially correct, these intervals were respectively (70%, 99%) and 
(83%, 100%). For reviewer A, who marked two connections as “?”, results were 
calculated in two ways: with the 2 connections removed from analysis and with 
the 2 connections viewed as fully incorrect. For the case in which 22 of 22 con- 
nections were deemed correct, the confidence interval for both the percentage 
entirely correct and the percentage clinically acceptable was (82%, 100%). For 
the case in which 22 of 24 connections were considered correct and 2 incorrect, 
the confidence intervals for the proportion entirely correct and for the proportion 
clinically acceptable were both (70%, 99%). 

Tree-based description of the vasculature permits a variety of useful view- 
ing options not otherwise available. Figure El shows a sequence of images that 
simulate progressive passage of a catheter through a vascular tree. When this 
viewing mode is selected, clicking on a vessel’s projection point will display only 
the distal portion of that vessel and its relevant descendents. The tree shown is 
the same as that in Fig. El but from a different point of view. 
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(a) 







(c) 



Fig. 5. Simulation of catheter passage through a tree by point and click. Projections 
of the skeletons of segmented vessels are in white, (a) Catheter is in the carotid artery. 
The anterior cerebral and middle cerebral groups fill, (b) The catheter passes into one 
of the three middle cerebral trunks. The anterior cerebral group and the 2 other middle 
cerebral trunks do not fill, (c) The catheter passes distally. Only a few small branches 
fill 



5 Discussion 

The rapid rise of interventional neuroradiology has underlined the need for indi- 
vidualized 3D maps of the intracranial vasculature. This paper describes creation 
of vascular trees from segmented MRA data. 

The proposed task is made difficult by the complexity of the vasculature 
and because MRA datasets are noisy. Segmentation is difficult. In addition, 
any segmentation that includes large portions of the circulation is also likely to 
include spurious objects (Fig. The creation of meaningful vascular trees 

therefore not only requires correct determination of parent -child relationships 
but also the elimination of spurious vessels produced by noise. 

Several groups have segmented a few vessels from MRA and have registered 
them with DSA pascHj. Extraction is usually limited to large vessels, however, 
and no tree description is provided. Gerig and colleagues provide images that 
suggest more complete extraction iscsini. This group also suggests graph- 
based description of the intracerebral vasculature PEI- However, the number 
of vessels actually included in their graph description appears small ^2j. These 
graphs have also not been clinically tested for accuracy. It is therefore difficult 
to compare them with those produced by the methods described here. 

5.1 Disadvantages of Our Approach 

A potential disadvantage of our segmentation protocol is that it provides geo- 
metrical information alone. We do not use the MRA flow direction data used 
by others Pi5li7li8j . We therefore do not know flow direction in the Circle of 
Willis. This report employs a DSA to segregate major vascular groups. It would 
be preferable to eliminate this step. One solution may be to use the width in- 
formation of segmented data, since arterial flow is normally directed from wider 
arteries into narrower ones. 
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(a) (b) (c) 



Fig. 6. Connectivity error produced by missing MR data, (a) 3D vascular tree com- 
prised of segmented vessels (white) whose skeletons are projected upon a DSA 
(1024 X 1024). The topmost MR section is given by a dotted line, (b) Enlargement 
of the indicated region in (a). The segmentation has a gap at the top of a loop (arrow) 
because the MR is incomplete. A connectivity error is produced and the distal portion 
of the loop connects incorrectly to a nearby vessel, (c) Connectivity and flow correction 
by manual editing. Changes are implemented in 3D 



A limitation inherent to any method of determining nodal connections is 
that missing data may produce connectivity errors. There are at least 3 reasons 
why our segmentations may contain gaps. First, if the MRA does not cover a 
sufficient volume of the head, vascular loops will be truncated. In such cases the 
graph description either fails to include the distal part of the loop or, worse, 
falsely connects the distal loop to a neighboring vessel. Figure El provides an 
example for a patient not included in this study. 

A second reason for gaps in the data is that our segmentations require a 
user-supplied seed point for each vessel. If the user does not inspect the image 
data carefully, a faint but important vessel may be missed. A more automated 
approach is preferable. Our group is actively pursuing solution to this problem, 
and initial results from fully automated, problem specific extraction methods 
are promising. Finally, an MRA may fail to visualize vessels containing slow or 
turbulent flow. We have no solution to the connectivity problem produced by 
missing vessel segments other than that of manual editing. 

A final disadvantage of our approach is that, for some types of surgical plan- 
ning, the amount of detail provided by MRA may be insufficient. The current 
study analyzes only the accuracy of nodal connections and the presence of extra- 
neous data. It does not address the issue of missing vessels except as such vessels 
influence the accuracy of nodal connection. We are therefore developing methods 
to provide a 3D map at an angiographic level of detail by reconstructing sets of 
DSA images and building upon the 3D base provided by MRA mm- 

5.2 Advantages of Our Approach 

Despite these limitations, our approach has several advantages, many of which 
are inherent to the segmentation method itself. First, our segmentation method 
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is capable of tracking an individual vessel for long distances. Second, the ex- 
tractions do not appear to jump from one vessel into another or to extend for 
noticeable distances into patches of noise. Third, the segmentation of each true 
vessel is close to complete, so that in the majority of cases the parental and 
child connection points are within 1-2 voxels of each other. This feature makes 
it possible to enforce tight connectivity requirements during tree creation and 
to deal effectively with the problem of spurious vessels produced by noise. 

Another major advantage of our approach is that the computations are rea- 
sonably fast and inexpensive. Extraction of a full MRA takes 20-30 minutes, 
registration of segmented vessels with a DSA takes about 10 minutes, and tree 
creation is performed in seconds. All programs run well on a Pentium 220 ma- 
chine under Windows. All programs require less than 64 megabytes of memory. 

The ability to provide accurate graph descriptions of the vasculature will 
benefit both surgeons and interventional neuroradiologists. Figure 0 provides 
one example of how we intend to use these methods. Specifically, we intend to 
help guide endovascular procedures by tracking the position of the catheter and 
placing each vessel projection within a 3D context. 

Although further testing is required before we fully know the strengths and 
limitations of the method, these results are highly encouraging. We have ported 
all programs to the Windows environment and are writing user interfaces suitable 
for clinicians. 
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Abstract. We propose to use statistical models of shape and texture 
as deformable anatomical atlases. By training on sets of labelled ex- 
amples these can represent both the mean structure and appearance of 
anatomy in medical images, and the allowable modes of deformation. 

Given enough training examples such a model should be able synthesise 
any image of normal anatomy. By finding the parameters which min- 
imise the difference between the synthesised model image and the target 
image we can locate all the modelled structure. This potentially time con- 
suming step can be solved rapidly using the Active Appearance Model 
(AAM). In this paper we describe the models and the AAM algorithm 
and demonstrate the approach on structures in MR brain cross-sections. 

1 Introduction 

It has been recognised for some time that the ability to match an anatomical 
atlas to individual patient images provides the basis for solving several important 
problems in medical image interpretation. Once the atlas has been matched to a 
particular image, structures of interest can be labelled and extracted for further 
analysis. Matching to an atlas also defines the registration between different 
images of the same patient - allowing information obtained at different times or 
from different imaging modalities to be combined - and the non-rigid registration 
of images of different patients - allowing population studies to be analysed in a 
common frame of reference. Same-patient data fusion is sometimes approached 
directly as a rigid registration problem (particularly in the brain) but the atlas 
matching approach is more general. 

Given its central importance, the atlas-matching problem has received con- 
siderable attention. Two main approaches can be identified: landmark-based - 
in which key points or surfaces in image and atlas are brought into alignment; 
and image-based - in which an atlas image is allowed to deform to achieve as 
close a match as possible between corresponding pixel/ voxel intensity values in 
the deformed atlas and patient image. In either case, a dense correspondence 
is established between atlas and image, allowing labels and image values to be 
transferred between the two frames of reference. The landmark-based approach 
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relies on extracting landmark points/surfaces on the basis of local image struc- 
ture, then establishing correspondences between atlas and image landmarks. We 
have shown previously that this approach can be made efficient and robust if 
a statistical model of shape (representing the possible spatial arrangements of 
landmarks) is used to constrain the solution to the correspondence problem via 
an Active Shape Model m Once landmark correspondences have been estab- 
lished a dense correspondence is obtained by interpolation. The image-based 
approach has the advantage that all the data are used in establishing the dense 
correspondence. To set against this is the disadvantage that shape statistics 
cannot easily be used in establishing the match - typically, arbitrary elastic or 
viscous regularisation terms are used to limit the degree of deformation allowed. 
Recently Wang and Staib have described a method of incorporating statisti- 
cal shape information into an image-based elastic matching algorithm. Although 
they show that this leads to more accurate results, shape and intensity match- 
ing are combined in an ad hoc way and the method is slow. In this paper we 
describe a unified approach to matching an atlas to patient images using both 
shape and intensity information. We show how a statistical appearance model 
(atlas), describing allowable variation in shape and intensity, can be constructed 
from a set of example images. We also describe an efficient Active Appearance 
Model (AAM) algorithm for matching the model to new images by minimising 
pixel/voxel intensity differences, subject to statistical constraints captured by 
the model. We illustrate the method applied to 2-D MR images of the brain, 
using an atlas containing all the important sub-cortical structures, and present 
quantitative results demonstrating that our method achieves accurate matching 
in a few seconds on a modern PC. 



2 Background 

The inter- and intra-personal variability inherent in biological structures makes 
medical image interpretation a difficult task. In recent years there has been 
considerable interest in methods that use deformable models, or atlases, to in- 
terpret images. One motivation is to achieve robust performance by using the 
atlas to constrain solutions to be valid examples of the structure(s) modelled. 
Of more fundamental importance is the fact that, once an atlas and patient im- 
age have been matched ~ producing a dense correspondence - anatomical labels 
and intensity values can be transferred directly. This forms a basis for automated 
anatomical interpretation and for data fusion across different images of the same 
individual or across similar images of different individuals. For a comprehensive 
review of work in this field there are recent surveys of image registration meth- 
ods and deformable models in medical image analysis CUHl. We give here a 
brief review covering some of the more important points. 

Bajcsy et. al. describe an image-based atlas that deforms to fit new im- 
ages by minimising pixel/ voxel intensity differences [2|. Since this is an under- 
constrained problem, they regularise their solution by introducing an elastic 
deformation cost. Christensen et. al. describe a similar approach, but use a vis- 
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cous flow rather than elastic model of deformation, and incorporate statistical 
information about local deformations |HI8| . This results in more accurate match- 
ing, but is computationally expensive. Both approaches require good initialisa- 
tion to converge to a satisfactory solution since the deformations allowed are 
not constrained to be anatomically plausible. Landmark-based methods involve 
three steps: locating the landmarks, establishing correspondences, and warping 
the image or atlas to align the corresponded landmarks. Bookstein describes an 
elastic matching approach based on the use of thin plate splines [3| - he assumes 
that landmarks have been identified and corresponded manually. Subsol et. al. 
m extract crest-lines, which they use to establish landmark-based correspon- 
dence. They use these to perform morphometrical studies and to match images 
to atlases. 

We have previously shown that a statistical deformation model can be used 
to simultaneously locate landmarks and establish image-atlas correspondences 
m- We obtain a parameterised statistical model of the domain of ‘legal’ shape 
variation from a set of training images. An Active Shape Model (ASM) is used 
to search for local image structure consistent with each of the landmarks, whilst 
constraining the configuration of landmarks using the statistical shape model. 
Typically, landmarks are closely spaced around the boundaries of structures of 
interest. A dense image-atlas correspondence can be established using thin plate 
splines. The original scheme was described in 2-D - it has been extended to 3-D 
by Hill et. al. |lti| and Szekely et. al. |24| . 

None of the approaches outlined above is ideal. The use of a statistical defor- 
mation model allows rapid, reasonably robust matching and provides a principled 
basis for constraining deformation during matching. The ASM algorithm does 
not, however, use the image evidence particularly efficiently - only the intensity 
data in the vicinity of landmark points affects the final solution. The image- 
based approaches of Bajcsy et. al. P and Christensen et. al. 0 use the image 
evidence more efficiently, but allow arbitrary deformations. Wang and Staib izni 
have recently attempted to incorporate statistical shape information into an 
image-based elastic matching approach. They do this by using a method very 
closely related to an ASM to And boundary landmarks in the image. An addi- 
tional elastic matching term is added to the matching criterion, to encourage the 
image boundaries to coincide with the atlas boundaries. This is a rather ad hoc 
approach and the method is computationally expensive. In this paper we seek 
to unify the image-based and statistical modelling approaches in a principled 
way, leading to a method that is fast, robust and makes optimal use of both the 
image data and prior knowledge of the variability present in the class of images 
to be analysed. 

The Active Appearance Model (AAM) approach that we describe also draws 
on other previous work. Cootes et. al. describe a model of the position-intensity 
surface, allowing full synthesis of the appearance of objects that are variable in 
shape and intensity mi- They do not, however, describe a plausible matching 
algorithm. Nastar et. al. describe a related model combining physical and statis- 
tical modes of deformation j21)) . Although they describe a matching algorithm 
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it requires very good initialisation. Jones and Poggio use a model capable of 
synthesizing faces and describe a stochastic optimisation method to match the 
model to new face images ini. The method is slow but can be robust because of 
the quality of the synthesized images. Edwards et. al. also describe models of the 
combined shape and intensity appearance of faces M- They describe how the 
models can be matched to new images using an ASM; the method is fast, but 
does not make full use of the image data. Our new AAM approach is an exten- 
sion of this idea, using all the information in the combined appearance model to 
match to the image. Sclaroff and Isidore describe Active Blobs for tracking 1221. 
Their approach is similar to our AAM, though an Active Blob is derived from 
a single image rather than a training set of images. The example is used as a 
template, allowing low energy shape deformations and simple intensity variation. 
In contrast, AAMs learn what are valid shape and intensity variations from a 
training set. 



3 Active Appearance Models 

This section describes our statistical appearance models and outlines the basic 
AAM matching algorithm. A more comprehensive description is given in |10| . An 
AAM contains two main components: A parameterised model of object appear- 
ance, and an estimate of the relationship between parameter errors and induced 
image residuals. 

3.1 Appearance Models 

An appearance model can represent both the shape and texture variability seen 
in a training set. The training set consists of labelled images, where key landmark 
points are marked on each example object. For instance, to build a model of the 
sub-cortical structures in 2D MR images of the brain we need a number of images 
marked with points at key positions to outline the main features (Fig.P). 

Given such a set we can generate a statistical model of shape variation by 
applying Principal Component Analysis (PCA) to the set of vectors describing 
the shapes in the training set (see for details). The labelled points, x, on 
a single object describe the shape of that object. Any example can then be 
approximated using: 



x = x-t-Psb^ (1) 

where x is the mean shape vector, is a set of orthogonal modes of shape 
variation and is a vector of shape parameters. 

To build a statistical model of the grey-level appearance we warp each exam- 
ple image so that its control points match the mean shape (using a triangulation 
algorithm). We then sample the intensity information from the shape-normalised 
image over the region covered by the mean shape. To minimise the effect of global 
lighting variation, we normalise the resulting samples. 



326 



T. F. Cootes et al. 




Fig. 1. Example of MR brain slice labelled with 123 landmark points around the ven- 
tricles, the caudate nucleus and the lentiform nucleus 



By applying PCA to the normalised data we obtain a linear model: 

g = g + Pgbg (2) 

where g is the mean normalised grey-level vector, Pg is a set of orthogonal modes 
of intensity variation and bg is a set of grey- level parameters. 

The shape and appearance of any example can thus be summarised by the 
vectors bg and bg. Since there may be correlations between the shape and grey- 
level variations, we concatenate the vectors, apply a further PCA and obtain a 
model of the form 



Wgbg\ 

bg J 



= b = 




c = Qc 



( 3 ) 



where Wg is a diagonal matrix of weights for each shape parameter, allowing 
for the difference in units between the shape and grey models, Q is a set of 
orthogonal modes and c is a vector of appearance parameters controlling both the 
shape and grey-levels of the model. Since the shape and grey-model parameters 
have zero mean, so does c. 

Note that the linear nature of the model allows us to express the shape and 
grey-levels directly as functions of c 



X = x-b PgW^ iQgC , g = g-bPgQgC. (4) 

An example image can be synthesised for a given c by generating the shape- 
free grey-level image from the vector g and warping it using the control points 
described by x. 

For instance. Fig. 0 shows the effects of varying the first two shape model 
parameters, bsi, bs 2 , of a model trained on a set of 72 2D MR images of the 
brain, labelled as shown in Fig.Q] Figure El shows the effects of varying the first 
two appearance model parameters, ci, C 2 , which change both the shape and the 
texture component of the synthesised image. 
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^ ^ ^ ^ CP ^ 



bsi varies by ±2 s.d.s fos 2 varies by ±2 s.d.s 

Fig. 2. First two modes of shape model of part of a 2D MR image of the brain 




Cl varies by ±2 s.d.s C 2 varies by ±2 s.d.s 

Fig. 3. First two modes of appearance model of part of a 2D MR image of the brain 



3.2 Active Appearance Model Matching 

We treat matching as an optimisation problem in which we minimise the differ- 
ence between a new image and one synthesised by the appearance model. 

Given a set of model parameters, c, we can generate a hypothesis for the 
shape, X, and texture, gm, of a model instance. To compare this hypothesis with 
the image, we use the suggested shape to sample the image texture, gs, and 
compute the difference, i5g = gs — gm- We seek to minimise the magnitude of 

l<5g|- 

This is potentially a very difficult optimisation problem, but we exploit the 
fact that whenever we use a given model with images containing the modelled 
structure the optimisation problem will be similar. This means that we can learn 
how to solve the problem off-line. In particular, we observe that the pattern in 
the difference vector (5g will be related to the error in the model parameters. 

During a training phase, the AAM learns a linear relationship between Sg 
and the parameter perturbation required to correct this. 



Sc = ASg. (5) 

The matrix A is obtained by linear regression on random displacements from 
the true training set positions and the induced image residuals (See ^ for 
details). 

We can use Q in an iterative matching algorithm. Given the current estimate 
of model parameters, c, and the normalised image sample at the current estimate, 
gs, each iteration proceeds as follows: 
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— Evaluate the error vector i5g = gs — g^, 

— Evaluate the current error E = |(5gp 

— Compute the predicted displacement, Sc = A<5g 

— Set k = 1 

— Let c' = c — kSc 

— Sample the image at this new prediction, and calculate a new error vector, 

(5g' 

— If |<5gp < E then accept the new estimate, c', 

— Otherwise try at fc = 0.5, k = 0.25 etc. 

This is repeated until no improvement is made to the error, |5gp, and con- 
vergence is declared. 

We use a multi-resolution implementation, in which we iterate to convergence 
at each level before projecting the current solution to the next level of the model. 
This is more efficient and can converge to the correct solution from further away 
than search at a single resolution. 

For example. Fig. El shows an example of an AAM of the central structures 
of the brain slice converging from a displaced position on a previously unseen 
image. The model could represent about 10000 pixels and had 30 parameters of 
c. The search took about a second on a modern PC. Figure El shows examples 
of the results of the search, with the model points found superimposed on the 
target images. 

4 Results of Experiments 

We have applied our approach to 2D slices taken from similar positions in 28 3D 
MR images of the brain. The in-slice resolution is 1mm and the between slice 
resolution 1.5mm. A total of 72 slices were used, two or three from each brain 
image. Ground truth for the structures of interest (ventricles, caudate nucleus 
and lentiform nucleus) was annotated by hand using expert radiologist input. 

A set of ‘leave-one-brain-out’ experiments were performed to test the perfor- 
mance of our appoach. 

We trained a model using all the examples except those from one brain, then 
ran the AAM to convergence on each of the excluded slices. We measured the 
quality of fit of the texture model, and the errors in the model point positions 
compared to the original labelling. We missed out each brain in turn, and av- 
eraged the results.. Table E summarises the results. It includes the results of 
‘leave-all-in’ experiments for comparison, in which the model was used to search 
the training set. This gives an upper bound on performance. 

In addition we give the errors obtained when the model is fit directly to the 
labelled points - the ‘best fit’ column. This gives a measure of the best possible 
model fit. 

The texture difference is given as the RMS difference between the intensities 
synthesised by the model and those in the target image over the modelled region. 
The units are those of grey-level. The full range of grey-levels in the image was 
about 140 units, with noise of about 7 units (s.d.). Notice that in the miss- 1-out 



A Unified Framework for Atlas Matching Using Active Appearance Models 329 




Initial 2 iterations 6 iterations 




16 iterations (converged) original 

Fig. 4. Multi-resolution AAM search from a displaced position 



experiments the texture error found by search is better than that when fitting 
to the hand labelled points. This is because the search is able to compromise 
point position in favour of reducing texture error. 

The point error is given as the mean distance between corresponding model 
and image label points (Pt-Pt) and as the mean distance between model points 
and the labelled image boundary (Pt-Bnd). Close examination of the hand la- 
belled points suggests there is noise in their placement which may contribute 
considerably to the measured results. 

The code was written in C-| — h and run on a 166MHz Pentium II under Linux. 
The mean time per model match was about five seconds for a 30 parameter, 
10000 pixel model. This would take around one second on a modern PC. 



Table 1. Performance of AAM at matching brain model to images (± s.d.)(See Text) 



Measure 


Miss-l-Out 


Leave-all-in | 


Search 


Best Fit 


Search 


Best Fit 


Texture Error 


12.8 (±3.1) 


14.6 (±2.8) 


10.9 (±2.2) 


8.4 (±1.5) 


Pt-Pt Error (pixels) 


2.4 (±0.7) 


0.9 (±0.3) 


1.7 (±0.4) 


0.4 (±0.07) 


Pt-Bnd Error (pixels) 


1.2 (±0.3) 


0.6(±0.2) 


0.9 (±0.2) 


q 

o 

CO 

o 
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Fig. 5. Results of AAM search. Model points superimposed on target image 



4.1 Examples of Failure 

Figure ini shows two examples where the AAM has failed to locate boundaries 
correctly on unseen images. In both cases the examples show more extreme shape 
variation from the mean, and it is the outer boundaries that the model cannot 
locate. This is because the model only samples the image under its current loca- 
tion. There is not always enough information to drive the model outward to the 
correct outer boundary. One solution is to model the whole of the visible struc- 
ture (see below). Alternatively it may be possible to include explicit searching 
outside the current patch, for instance by searching along normals to current 
boundaries as is done in the Active Shape Model H21. This is the subject of 
current research. In practice, where time permits, one can use multiple starting 
points and then select the best result (the one with the smallest texture error). 




Fig. 6. Detail of examples of search failure. The AAM does not always find the correct 
outer boundaries of the ventricles (see text) 
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5 Discussion and Conclusions 

We have demonstrated that a deformable anatomical atlas can be built using 
statistical models of shape and appearance. Both the shape and the appearance 
of the atlas can vary in ways observed in the training set. Arbitrary deformations 
are not allowed. Matching to a new image involves minimising the difference 
between the synthesised atlas image and the target. This can be achieved rapidly 
using the Active Appearance Model matching algorithm. 

The AAM may not always give optimal results, but it would be straightfor- 
ward to use a general purpose optimiser (e.g. Simplex or Powell m) to ‘polish’ 
the final fit. 

Though we only demonstrated on the central part of the brain, models can 
be build of the whole cross-section. Figure Q shows the first two modes of such 
a model. This was trained from the same 72 example slices as above, but with 
additional points marked around the outside of the skull. The first modes are 
dominated by relative size changes between the structures. 




Cl varies by ±2 s.d.s 




C 2 varies by ±2 s.d.s 



Fig. 7. First two modes of appearance model of full brain cross-section from an MR 
image 



The appearance model relies on the existence of correspondence between 
structures in different images, and thus on a consistent topology across exam- 
ples. For some structures (for example, the sulci), this does not hold true. An 
alternative approach for sulci is described by Caunce and Taylor UM- 
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The approach has been demonstrated in 2D, but is extensible to 3D. The 
main complications are the size of the model and the difficulty of obtaining well 
annotated training data. Each mode of the texture model is the same size as an 
image - if many modes are used, the model could be rather large. Obtaining good 
(dense) correspondences in 3D images is difficult, and is the subject of current 






We hope to be able to match the models to different modalities by maximis- 
ing mutual information, rather than minimising intensity errors. During search 
we would form an ‘information difference’ image, measuring the areas in the 
target image not well predicted by the model, and use this to update the current 
parameters. 

We have shown how statistical models of appearance can represent both the 
mean and the modes of variation of shape and texture of structures appearing 
in medical images. Such models act as deformable anatomical atlases, in which 
the allowed deformation is learnt from a training set. The Active Appearance 
Model algorithm gives a fast method of matching the atlas to new images. 
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1 Introduction 

Our aim was to investigate and further develop a system of analysis of complex 
left ventricular wall kinetics. The proposed method is specifically adapted to 
gated myocardial perfusion SPECT data. The unique properties of gated SPECT 
data in this respect are the lack of fiduciary points, the relatively low spatial 
resolution and the conservation of total counts during the cardiac cycle 

In gated blood-pool studies, contrast ventriculography and echocardiography, 
individual points of the ventricular wall are not immediately identified. There 
may be some structures (valve plane, leaflets) which can serve as fiduciary points, 
but generally the ventricular wall is identified by an edge, i.e. as the interface 
between cavity and myocardium. Wall segments are identified from one moment 
of the cardiac cycle to the next by the intersection between an axis and an edge 
defining the wall. Motion is thus defined as the displacement of this intersection. 
The axis can be defined in various ways: as originating at the center of the cavity, 
as perpendicular to the long axis of the cavity, or as the normal to the edge. In all 
cases the intersection between axis and edge identifies the segment and identifies 
the motion. Therefore, motion unrelated to the axis cannot be detected (Fig. 1). 

2 Materials and Methods 

The data are gated myocardial perfusion SPECT images, consisting of eight 
or sixteen isometric image volumes in a 64^ format. Each image volume maps 
the distribution of the tracer (^^'"Tc-Sestamibi or Tetrofosmin) in the chest of 
the patient as count rate densities during a segment of the cardiac cycle. The 
images are reconstructed from 63 or 64 projection images, obtained from a dual 
or triple-head scintillation camera (Anger type). Reconstruction is achieved by 
Altered back-projection with a restorative band-pass Alter (Buttherworth) . After 
the acquisition and prior to the reconstruction, the data are corrected for under- 
sampling due to slight variations in the cycle length. The reconstructed images 
are centered over the myocardium, zoomed and reoriented such that the long axis 
becomes parallel to the z-axis of the image volume. The center of the cavity is 
placed approximately at the pixel location 32,32,32. The Anal zoomed image 
contains the myocardium as the main structure with high count rate densities. 
Non-structured background has lower count rate densities, but occasional sub- 
diaphragmatic high densities remain (representing intra-luminal gut activity). 



A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. .334- 133^ 1999. 
© Springer- Verlag Berlin Heidelberg 1999 
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Fig. 1. Tracking Intersection of Axis and edge 



At the time of the acquisition the tracer has largely left the vascular system 
and is located mainly in the intracellular space. The exception is that part of the 
tracer that is excreted into the gut through the biliary tract. In the reconstructed 
images this affects the sub-diaphragmatic count rate densities only, but in equal 
fashion and degree through the eight or sixteen image volumes. For this reason 
the total count rate in the images is conserved. The changes in the position of 
myocardial elements during the cardiac cycle therefore affect only the spatial 
distribution of the count rate densities. 

The proposed method rests on two principles of which only the first can 
be derived from first principle: First, the changes in the spatial distribution of 
the count rate densities affect an integral function of the count rate densities 
computed along any axis. Second, the actual changes can be recovered from 
those integral function computed along congruent angles. 

2.1 Integral Counts Analysis 

The total image activity, S{t), is constant for all values of t (and all projection 
directions). The integral function P{L,0) defines the percentile activity at x = 
L in time-bin 0. The location in x of the percentile P at t > 0 is found by 
linear interpolation. The displacement vector D{x, t) is the function showing the 
difference in location of a given percentile P between image 0 and image “t” 
(Fig. 3). The values of D{x,t) are periodic over “t” for all values of x and the 
value of D{x, 0) is zero for all values of x. 




Fig. 2. Effect of out-of-plane motion on in- 
plane analysis 



Fig. 3. Computing the displacement func- 
tion 
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2.2 Analysis Along Mnltiple Congruent Angles 

This derivation assumes that the analysis is performed for an integral function 
defined in “x”. In this section we derive the same values, assuming that the 
image has been rotated by an angle 9. As an example let 9 be an angle in the 
x,y planes. There is no rotation in the y,z or x,z planes. The rotated volume 
has a coordinate system {x',y',z'). The transformation is defined by: z = z' , 
y = a;'sin(0), x = x' cos{9). The integral analysis is now performed in a;'. The 
vector D(x' , t) can be decomposed in an x and an y component. In this example 
there is no z component. This approach can be expanded to 3 dimensions. 

2.3 Reconstructing Three Dimensional Motion Vectors 

Three-dimensional image vectors A, Y , and Z are then reconstituted as follows: 
Consider the component D{x,t) computed after the rotation 9,<P: the vector 
is re-projected in a volume D' {x' ,y' , z' ,t) in such a way that D'{x' ,y' , z' ,t) 
= D{x, t). The volume D'{x', y' , z' , t) is then rotated by —9, —<P and added to a 
volume Dx{x, y, z, t) which was initially set to zero. The same operation produces 
the vectors Dy{x,y,z,t) and Dz{x,y, z,t). The working hypothesis is that the 
vector volumes Dx, Dy, and Dz contain the x, y, and z motion components of 
the count rate densities, and, that the multiplicity of sampling angles provided 
motion resolution at a near pixel level. 

The congruent sampling, and the method of sampling, makes the measures 
independent of orientation and makes no reference to a cardiac related coordinate 
system. The assumption is that all motion can thus be detected and is fully 
expressed in the X , Y, and Z components. Furthermore, each pixel has motion 
characteristics, rather than those pixels that are at the edge or the center of the 
ventricular wall. If a particular motion is judged to be of particular significance, 
it can be derived a posteriori. As an example we consider radial motion and 
rotational motion. 



3 Results 

Preliminary results address the following questions. 

1. Can motion that is usually detected with a preset cardiac coordinate system 
be recovered by the integral approach, which uses a posteriori coordinates? 

2. Does the integral approach give some regional information, or does the in- 
formation remain global? 

3. Can complex motion be derived? 

4. Is the method insensitive to orientation? 

The motion is displayed by the phase and amplitude of the derived displacement 
vector. From the displacement in X, Y , and Z we have derived motion toward 
the long axis (in plane), motion towards the center of the cavity (off plane) and 
angular or rotational motion (in plane) . The results of the analysis are displayed 
by looking at the central orthogonal slices (long axis horizontal, long axis vertical 
and short axis as in Figs. 4-7). 
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Fig. 4. X, Y, and 
Z amplitudes 



Fig. 5. X, Y , and 
Z phases 



Fig. 6. Cylindrical 
radial, Short 

axis Angular and 
spherical radial 

amplitudes 



Fig. 7. Short 

axis Angular and 
spherical radial 

phases 



Figures 4-7 shows the amplitudes and phases of the first harmonic of the 
displacement functions DX, DY and DZ, and derived motion (radial, spherical 
and angular) . The first three rows are the representation of the central horizontal 
long axis slices. Next we show the central vertical and finally the central short 
axis slices. Case A is a mathematical phantom of a shrinking cylinder. The other 
columns represent 5 patient cases. 

All questions can be answered positively. We can indeed recover centripetal 
motion, in plane and off plane, and local or regional particularities can be de- 
tected. 

4 Discussion 

If ventricular wall kinetics adds information to the myocardial perfusion studies 
or indeed yields important clinical information by itself, one can expect that 
any improvement in the definition of it would increase its clinical utility. One 
possible improvement is the inclusion of complex motion or deformation anal- 
ysis. Complex motion includes off-plane motion and rotational motion. Most 
described methods (for the analysis of gated myocardial SPECT) cannot detect 
complex motion. Our proposal addresses the problem directly. The ultimate goal 
is not to develop a method that could effectively use all the information yielded 
by MRI or echocardiography, but to enrich at no or little cost the information 
yielded by myocardial perfusion studies. 

One important feature of gated SPECT is that the data are truly three- 
dimensional, isotropic and that all parts of the image are recordings of the same 
cardiac cycles. The truly volumetric aspect of the data has generally not been 
fully utilized, with many authors restricting the analysis to motion or deforma- 
tion in a plane or in complex combinations of planes |niTT)| . We 
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believe that cross plane displacement makes “in plane” analysis fundamentally 
incomplete, that the polar 3D method is an improvement (Fig. 2), but that 
motion detection independently of an a-priori cardiac coordinate system is the 
ultimate answer. 

The method is experimental and in an early phase. There is no derivation 
from first principle that assures that the method should work. Specifically, we do 
not know if the additive decomposition of the integral function, taken at different 
spatial angles, will yield sufficiently accurate displacement vectors at the pixel 
level. We are, however, able to predict which factors could have a critical effect 
on the outcome. In addition, we have methods to test if motion can be effectively 
characterized. 



The basic assumption is that the unstructured background and the sub- 
diaphragmatic structured background remain invariant during the cardiac cy- 
cle. This assumption seems physiologically reasonable, but noise could produce 
regional variations, which in turn could influence the integral function. In ad- 
dition, the original (centered, zoomed and reoriented) image contains non-zero 
pixel densities in all pixels of the cube. However, after a full rotation and map- 
ping into another cube, some pixels (at the corners) cannot be mapped in the 
new volume. The effect of clipping the counts (setting all count rate densities 
< T equal to zero, while maintaining those > T at their original value) must 
be investigated. The level of T could be defined by the functional criterion we 
defined earlier Another possibility is masking. All pixels outside the largest 
sphere inscribed in the image volume can be set at zero, or all pixel outside of a 
mask surrounding the myocardium. Masking could be based on a segmentation 
described earlier |14l2l4l9lin| . 

It should be mentioned at this point that one method was described 0 
which was at the same time truly three-dimensional and did not use a preset 
coordinate system. The method was based on a three-dimensional matching 
method, originally described by Besl P and later Feldmar [3|. The method was 
initially utilized to match static myocardial SPECT images ('^18191 1 1 )j . Declerck 
0 adapted the method to four dimensions. The analysis however favors some 
directions and works only on endo- and epicardial surfaces. 

In conclusion: Our preliminary results show that we are able to extract kinetic 
information from gated myocardial perfusion SPECT images using prototype 
analysis algorithms. The results also support our hypothesis that a combined 
analysis of perfusion and kinetics from the same perfusion SPECT images will 
enable more accurate classification of patients with a variety of perfusion defects. 
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Abstract. Blood pool contrast agents for Magnetic Resonance Angiog- 
raphy (MRA) have a prolonged intravascular half-life and therefore have 
the potential for visualizing large anatomical regions with high reso- 
lution. A potential problem is that both the arteries and veins are en- 
hanced, resulting in venous overprojection in Maximum Intensity Projec- 
tions (MIPs), which are most widely used for inspecting MRA datasets. 
In this paper a novel approach for improved arterial visualization is intro- 
duced. It is based on suppressing the major overlapping veins in MIPs. 
The approach is illustrated on MRA images of the peripheral vasculature 
acquired using the blood pool agent NC100150. The resulting visualiza- 
tions are compared to Digital Subtraction Angiography (DSA) images. 



1 Introduction 

Conventional noninvasive MRA is an accepted clinical technique which facilitates 
high quality depiction of the cerebral vasculature. For abdominal and peripheral 
imaging, the effectiveness of conventional MRA is limited owing to some intrinsic 
limitations of the technique. Especially complicated flow patterns and in-plane 
flow may result in signal voids which can lead to an overestimate of a stenosis 

H- , , 

The introduction of Gadopentetate dimeglumine as a T1 shortening contrast 
agent |2| has considerably increased the clinical applicability of MRA. Since the 
shortened T1 of blood provides contrast, rather than the flow dynamics, the 
technique is less sensitive to flow conditions. Moreover, high contrast can be 
obtained in shorter examination times, enabling breath hold sequences which 
reduce motion artifacts. A shortcoming of Gadopentetate dimeglumine is its 
rapid diffusion in extracellular space. This limits the imaging window to a few 
minutes since the background signal increases as well. 

Ultra-small SuperParamagnetic Iron Oxide (USPIO) particles are a new class 
of MRI contrast agents. They were primarily designed for their T2* relaxation 
properties, but also exhibit strong T1 shortening properties in blood 0, and 
can therefore also be used for Gontrast Enhanced (GE) MRA. The primary 
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advantage of these blood pool agents is their long intravascular half-life, which 
paves the way for steady state MRA. Longer examination times enable coverage 
of larger anatomical regions at higher spatial resolution. 

An important drawback of blood pool agents is the simultaneous enhance- 
ment of arteries and veins. This can significantly hamper diagnosis of e.g. the 
main arterial branches EEEI An anatomical region in which this is certainly 
the case is the leg, which is highly vascularized and has arteries and veins run- 
ning close to each other. Either the acquisition should be modified in order to 
construct selective arterial and venous angiograms, which is nontrivial, or ret- 
rospective image processing is required. In this paper we introduce one possible 
approach for enhanced arterial visualization, which is based on the idea of re- 
moving the most important overlapping veins, prior to performing a MIP. 

2 Image Acquisition 

Patients were included as a part of a Phase II study of NCI00I50 injection 
(Nycomed Imaging AS, Oslo). Imaging was performed on a 1.5 T system (Gy- 
roscan NT, Powertrak 6000, Philips Medical Systems, Best, The Netherlands), 
using a gradient echo technique. Images showed strong vascular enhancement, 
both in the arteries and veins. In Fig. Ql we show a coronal slice of the upper 
leg/abdominal region and a corresponding MIP. Owing to the adjacency of the 
major arteries and veins, the status of the arteries can not be determined from 
these images, even if MIPs from different angles are reconstructed. Therefore, en- 
hanced arterial visualization is considered an important step towards the clinical 
use of MRA blood pool agents jXIE|. 




Fig. 1. Coronal slices in the abdominal/upper leg region (left) and the corresponding 
MIP (right) 
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3 Enhanced Arterial Visualization 

Existing techniques for vessel enhancement and segmentation cannot straightfor- 
wardly be used for blood pool MR A images. The adjacency of a large number of 
small arteries and veins makes segmentation more complicated than in conven- 
tional MRA images. To overcome these problems, we devised a relatively simple 
technique which is limited to the automated segmentation of a small number of 
the main overlapping veins, which are selected by an operator. This approach has 
two main advantages. First, the algorithm is fast since it only uses local com- 
putations. This distinguishes the method from approaches that first compute 
all features in the image that have a vessel-like shape, which are subsequently 
grouped. Results from our procedure are readily available, which is important for 
clinical use. Secondly, the segmented veins are removed rather than performing 
a segmentation of the arteries. Thus, the status of the arteries is judged from the 
original data, which limits the chance of introducing errors in diagnosis, owing 
to imperfections in the segmentation. The procedure is schematically drawn in 

Fig.|3 




Fig. 2. Outline of the algorithm to enhance arterial visualization in MIP images. In 
the images the main arteries (grey), the major veins (black) and other overlapping 
venous structures (dark grey) are shown. First (I), the major venous structures are 
segmented and suppressed in the MIP. Subsequently (II), a targetted MIP (grey band) 
is performed, which removes most remaining overlapping vessels 



The tools which are required for this procedure are (i) a reliable segmentation 
tool for the veins, and (ii) an interactive tool to perform targetted MIPs in 
arbitrary directions. The procedure for vessel segmentation is adapted from an 
algorithm to determine the central vessel axis for the preoperative evaluation of 
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patients who are scheduled for minimally invasive treatment of an abdominal 
aneurysm [7] and is illustrated in Fig. 0 




Fig. 3. Schematic of vessel segmentation. The user initializes two starting points on 
the central vessel axis. Segmentation is performed in a plane perpendicular to the 
vessel segment which is dehned by these points. The gravity point of the segmentation 
becomes the new point on the vessel axis. A next point is predicted by extrapolation, 
and the procedure iterates until the desired vessel segment is tracked 



First, two points are selected which define a first segment of the central vessel 
axis. A plane perpendicular to this segment is constructed, where the boundary 
of the vessel is determined using dynamic programming. Here a transformation 
into polar coordinates is made with the point on the central vessel axis as origin. 
A minimal cost path is found based on the image gradient magnitude in the 
direction of a ray originating from the origin: 

= ( 1 ) 

or |r| 

Using the gradient in the direction of the ray assures that only transitions from 
high signal (vessel) to low signal (background) are considered being part of the 
lumen boundary. The gradient is computed by convolving with a Gaussian of 
scale cr = 2 in order to be more robust to noise. Based on the estimated contour 
a new point on the central vessel axis is defined by the gravity point. Based on 
this point and the previous point a new point is estimated by extrapolation. The 
procedure is iterated until the desired vessel segment is tracked. 

4 Results 

The algorithm to enhance arterial visualization has been applied to six patients 
included in the study. In the left image of Fig. 0 we show a typical result of venous 
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segmentation using the semi-automated tracking procedure. In this case, no user 
interaction other than the initialization of the first two points was required. 




Fig. 4. Venous segmentation in the upper leg and abdominal region (left), a targeted 
MIP after venous suprression (middle) and the corresponding DSA image (right) 



In the middle of Fig. E| we show a MIP after the major venous structures 
adjacent to the important arteries have been removed. Since other overlapping 
venous structures are now relatively distant (in the original 3D data) from the 
main arterial branches remaining overprojection is reduced by a targeted MIP. 
Both in the DSA image and the processed MRA image a large dissection in 
the right ilic artery and a stenosis can be seen, which were not visible in the 
unprocessed MIP. 



5 Discussion 

MRA using blood pool agents has the potential for covering large anatomical 
regions of interest at high resolution. However, the simultaneous enhancement 
of veins and arteries hampers a quick interpretation of the images using MIPs, 
and possibly limits the clinical utility of these agents. 

A possible strategy, which is advantageous for a number of applications, is 
the segmentation of the entire venous structure. For certain anatomical locations 
with a small number of vessel structures, or for sufficiently reduced regions of 
interest, this seems possible without excessive user interaction. For the entire 
leg, however, it is a difficult procedure. In this paper, we investigated whether 
a relatively simple procedure, which only segments the major veins which are 
adjacent to the arterial branches of interest, and subsequently removes other 
structures by performing targetted MIPs, yields satisfactory arterial visualiza- 
tion. The method is fast, allows for user supervision and does not influence the 
original data around the arteries, so that the anatomical context can still be 
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assessed interactively. Results in the aortoiliac region show that the procedure 
aids in diagnosis from MIPs. 

There are a number of points that could be improved. Simultaneously track- 
ing arteries and veins which run close along each other will avoid the chance of 
including arterial voxels in the venous segmentation. Second, additional informa- 
tion can be obtained during the MRA acquisition, e.g. using first pass imaging 
or flow information. Developments in this area will be crucial to the clinical 
applicability of blood pool agents. 
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Abstract. Accurate delineation of the volumetric motion of left ventri- 
cle (LV) of the heart over time from tagged MRI is an important area of 
research. We have built a system that takes tagged short-axis (SA) and 
long-axis (LA) image sequences as input, fits a 4D B-spline model to the 
LV of the heart by simultaneously fitting knot solids to the SA and LA 
frame sequences via matching 3 sequences of model knot planes to LV 
tag planes for 4D tracking. The advantage of the 4D model is that 3D 
material point localization and displacement reconstruction is achieved 
in a single step. The generated 3D displacement helds are validated with 
a cardiac motion simulator, and 3D motion fields capturing in-vivo de- 
formations in a parcine model of a LV with postero-lateral myocardial 
infarction are illustrated. 



1 Introduction 



Noninvasive techniques for assessing the dynamic behavior of the human heart 
are invaluable in the diagnosis of ischemic heart disease, as abnormalities in the 
myocardial motion sensitively reflect deficits in blood perfusion p. In MR tag- 
ging, the magnetization property of selective material points in the myocardium 
are altered in order to create tagged patterns within a deforming body such as 
the heart muscle. The resulting pattern defines a time- varying curvilinear coor- 
dinate system on the tissue. During tissue contractions, the grid patterns move, 
allowing for visual tracking of the grid intersections over time. The intrinsic high 
spatial and temporal resolutions of such myocardial analysis schemes provide un- 
surpassed information about local deformation in the myocardium which can be 
used to derive strain and deformation indices from different myocardial regions. 

Previous research in analysis of tagged images includes 1 1 12lbltil7| . Among 
various approaches which have been proposed in the literature for analysis of 
tagged images, our previous work in is most closely related to this paper. 
In our former paper, we proposed a B-spline solid model to concurrently track 
tag lines in different image slices by implicitly defined B-spline surfaces which 
align themselves with tagged points. The primary contribution of this paper is 
in utilizing a knot solid to represent each pair of SA and LA frame of data and 
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Knot plane 




Fig. 1. A hyperpatch representing a small deforming cuboid enclosed by 6 tag planes 



using 3 sequences of model knot planes to detect 3 sequences of LV tag planes. 
Once the 4D model is able to generate a B-solid which varies continuously over 
time, a 3D motion field between any two time instants is immediately available. 
The advantages of the B-spline approach over previous approaches to tagged MR 
image analysis are: (1) B-spline interpolation is performed over 3D space and 
time. (2) The movement of each myocardial point over time can be captured very 
accurately by setting the three parameters u, v, w of the model to any fractional 
value. (3) Intersections of three orthogonal tag planes and their motions are 
immediately available. (4) Change of strain over time can easily be computed at 
all myocardial points. 



2 4D B-spline Representation 

The simplest and most direct geometric element to model a time- varying solid is 
a hyperpatch P|. A hyperpatch (Fig.Q]) is a patch-bounded collection of points 
whose coordinates are given by continuous, four-parameter, single- valued mathe- 
matical functions of the form: {x = x(u, v,w,t),y = y{u, v,w,t), z = z{u, v, w, t)} 
where t is the time variable. The parametric variables u, v, and w are constrained 
to the interval u,v,w G [0, 1] in a hyperpatch. A point {x, y, z) inside the hyper- 
patch is represented by S(u, v, w, t) and at a time instant t = t*, fixing the value 
of one of the parametric variables results in an isoparametric surface within 
or on the boundary of the hyperpatch in terms of the other two variables, which 
remain free. Many hyperpatches tightly placed together, form a solid, and each 
hyperpatch shares its six faces with six neighboring hyperpatches. In the solid 
representation, ranges of u,v,w are from 0 to some integer value. For instance, 
u G [0,1] denotes the first array of hyperpatches in terms of the v, w param- 
eters; u G [1,2] denotes the second array of hyperpatches, and so forth. The 
surface determined by setting one of u, u, w to a constant integer value is called 
a knot surface or a knot plane which are the delimiting surfaces of these 
hyperpatches. In a 4D B-spline model, knot planes become temporal functions, 
and the 3D solid captured at each knot time instant is called a knot solid. A 
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tensor product 4D B-spline model is expressed as: 



I J K L 

s{u,v,w,t) = PijkiNi{u)Nj{v)Nk{w)Ni{t) (1) 

2=1 j—1 k—1 1—1 



where {I x J x K x L) is the total number of model control points; Ni{u), 
Nj{v), Nk{w), and Ni{t) are B-spline basis functions which blend control points 
Pijki- By changing the order of B-spline summation, a more efficient approach to 
computing a multi-dimensional B-spline model results. Given any time instant 
t*, a 3D grid of control points is specified that determines the 3D solid at t*. To 
compute the solid at t*, let us start with calculating the u = u* isoparametric 
planes. This is implemented in two steps: first we calculate all points with u = u* 
value along the B-spline curves, specified by each thread of control points in the 
u direction. We then calculate each B-spline surface by taking these u = u* 
points from the first step as control points, obtaining the u = u* isoparametric 
planes. This procedure may be mathematically stated as: 

L / K J /I 

sk,u,.,g) = e EE Mj{v^Nk{w') ( ^ ^^ Pijkl^il^ ) 

/=! yk=lj=l \i=l 



mn- ( 2 ) 



Once we are able to compute the isoparametric plane, S(rt*, u, rc, t*), we can 
obtain the entire model at time instant t* by continuously varying u*. The 
advantage of this method over the tensor product method in m is its efficiency 
in speed, bypassing the need for multiplication of large matrices whose majority 
of elements are zeros (due to B-spline bases having limited spatial extent). 



3 B-spline Fitting 



The tag lines on LA and SA images are formed by intersecting image slices with 
one or two sequences of tag planes, respectively. From the tag lines on SA and 
LA frames, the B-spline model can fit each knot solid to each frame of data by 
matching 3 orthogonal sequences of knot planes to 3 orthogonal sequences of 
tag planes (Fig. Since these tag planes deform with the myocardial tissue, 
the 4D model will then automatically interpolate the volumetric deformations 
of the LV over time and 3D space. We employ the Chamfer distance to build 
an objective function for fitting the tag planes. The total energy for the model, 
which is to be minimized, is defined as the sum of the energy of each knot solid 
which is defined by the sum of the energy of each knot plane. The energy of each 
knot plane is further defined as the integral of the corresponding potential over 
the knot plane surface. Thus the total energy for the model can be expressed as: 



t—1 \n=l 
Wm 



E = E ( E / / t))dvdw + E 



-^ = 1 






Ct,(S(u, V, w, t))dudw+ 



W — 1 



Cyj{S{u, V, w, t))dudv 



( 3 ) 
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Fig. 2. Knot solids fit temporal frames of data, each including 3 orthogonal sequences 
of tag lines 



where we have used Cu,Cy,Cuj to denote the split 4D potentials (a separate one 
for each tag plane) and Um, Vm, and Wm are the maximum knot values. 

Our model is based on a 4D grid of control points and the total energy of the 
model is a function of all control points. Every control point is related to frames 
of data and 3 sets of tag planes. Although the potential functions are split, all 
knot planes are simultaneously optimized. For energy minimization, we used the 
adaptive conjugate gradient descent method which shortens the step length prior 
to taking a step in the search direction that passes over the minimum point. The 
process halts if the step length becomes smaller than a threshold. 

4 Application to Tagged Images of the LV 

We adopted a quadric-quadric-quadric-quadric B-spline model to perform vali- 
dations. We utilized a cardiac motion simulator II2I8I to generate a sequence of 
deformed prolate spheroidal models of the LV. The tag lines in the simulated 
SA and LA images were first extracted. Then the system grouped tag lines by 
each tag plane and separate 4D Chamfer distance potentials were created for 
each tag plane. 

The simulated data included 6 frames. Each frame included 8 SA image slices, 
7 LA image slices, 14 tag planes (7 horizontal and 7 vertical) intersecting SA im- 
age slices, and 8 tag planes intersecting LA image slices. The fitting iteration for 
all frames took about 4.86 ms per control point on a Sun Ultra 30/300 platform. 
We used an8x8x9x7 grid of control points. The fitting algorithm converged 
in about 30 iterations. Therefore, the total fitting process approximately took 
588 seconds for 6 frames of data. 

An important byproduct of our approach is that at the conclusion of fitting 
knot solid to frames of data, a 4D model S{u,v,w,t) is determined. Given two 
solids S{u,v,w,tQ) and S{u,v,w,ti), a 3D B-spline interpolated motion field is 
immediately generated by employing the computation in 0: 

V{u,v,w) = Si{u,v,w,ti) - So{u,v,w,to) (4) 

The cardiac motion simulator was used to validate the accuracy of the generated 
motion fields. True 3D motion fields were first generated by the simulator. The 
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Fig. 3. RMS error plots between 3D ground-truth and 3D computed motion fields for 
a range of parameters of k\-. Radially dependent compression, k 2 - torsion, fca: Ellipti- 
calization in LA plane, and k-j-. Shear in z direction 



computed V by 0) was then strictly compared with the ground-truth. Figure 
0 shows RMS error plots between true and computed motion fields for a range 
of deformation parameters of the simulator. The method was also applied to 
images collected from a parcine model of a LV at baseline and after induction of 
a postero-lateral myocardial infarction (MI). Results from this experiment are 
illustrated in Fig. 0 The motion fields displayed were computed from the knot 
solid at frame 11 and the knot solid from frame 0 (see 0). The akinetic areas 
of the myocardium can readily be recognized from the post-MI motion fields. 

5 Conclusions 

We have built a system to fit and track tagged MRI data by the 4D deformable 
B-spline model. The presented framework for fitting model knot solid to frames 
of data by matching three orthogonal sequences of knot planes to three sequences 
of tag planes for volumetric tracking is the primary contribution of this article. 
After the tag lines were extracted and grouped by tag planes, 4D Chamfer dis- 
tance potentials were computed and used in fitting B-spline knot solids to frames 
of data. The generated 3D motion fields were validated with a cardiac motion 
simulator, and methods were applied to in-vivo data sets. 
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Fig. 4. The parcine model of a LV in short-axis orientation with a postero-lateral 
myocardial infarction. The top left image is the undeformed slice (slice 0, frame 0). The 
corresponding deformed slice (slice 0, frame 11) is shown on top right. The projected 
motion field (slice 0, frame 11) is shown in lower left. Please note that the motion field 
is truly 3D, and that it was projected into the plane of its respective image slice at 
frame 0 for display purposes. The lower right area of motion field (between 3 and 8 
o’clock positions) indicates akinesis. The picture of the histochemically stained tissue 
slice, roughly corresponding to the same MR image slice location is shown in lower 
right. Brighter myocardial areas correspond to necrotic zones with no dye uptake, and 
darker areas correspond to normal zones where dye is taken up by the tissue. 
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Abstract. The estimation of soft tissue deformation from 3D image se- 
quences is an important problem in a number of fields such as diagnosis 
of heart disease and image guided surgery. In this paper we describe a 
methodology for using biomechanical material models, within a Bayesian 
framework which allows for proper modeling of image noise, in order to 
estimate these deformations. The resulting partial differential equations 
are discretized and solved using the Hnite element method. We demon- 
strate the application of this method to estimating strains from sequences 
of in-vivo left ventricular MR images, where we incorporate information 
about the fibrous structure of the ventricle. The deformation estimates 
obtained exhibit similar patterns with measurements obtained from more 
invasive techniques, used as a gold standard. 



1 Introduction 

There is a class of medical image analysis problems where the goal is the esti- 
mation of the displacement field of an object or a group of objects. Examples 
of such problems are left ventricular (LV) wall motion estimation [HIKilH and 
image guided surgery^. In most of these applications, only a relatively sparse 
set of points, often called landmarks, can be reliably followed on the object from 
the image data and the estimation of the displacements of remainder of the es- 
timation task can be thought of as interpolation, in other words our problem is: 
given the displacements of such landmarks, find the best displacements for the 
rest of the region of interest. Often, however, the displacement estimates of the 
landmarks are corrupted by noise. In this case, the task becomes an approxima- 
tion problem, where now the goal is to estimate a displacement field that is close 
to the originally estimated displacements at the landmark points, and provides 
reasonable values elsewhere. 
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2 Methods 

We will pose this general problem in a Bayesian-Estimation framework where the 
goal is to find the displacement field u which maximizes the posterior probability: 



, argmax argmax /p(u u) x p(it) 

u = p(u\u ) = 

u u \ 



p{u^) 



( 1 ) 



where u is the output displacement field and u™ are the original sparse displace- 
ment estimates. The prior probability of the measurements piu"^) is a constant 
once these measurements have been made and therefore drops out of the mini- 
mization process. The first term will be derived from the noise model 

assumed in estimating the landmark positions and the second term p{u), the 
prior probability of the displacement, will be derived from a mechanical model. 
For a more detailed discussion seejE). 



2.1 Mechanical Model-based priors 

As previously demonstrated by Christiansen et al. ^ there is a correspondence 
between an internal energy function and a Gibbs-Prior. If the mechanical model 
is described in terms of an internal energy function W(C,u), where C repre- 
sents the material properties and u the displacement field, then we can write an 
equivalent prior probability density function p{u) (see equation of the Gibbs 
form: 



p(u) = fciexp(-IT(C',u)). (2) 

We will derive the model term W by a biomechanical model; this can be de- 
scribed in terms of an internal or strain energy function which depends on the 
deformation of the object and its intrinsic material properties. There are differ- 
ent classes of such models depending on the application; in the case of the left 
ventricle we will use an anisotropic linear elastic model which will allow us to 
incorporate information about the preferential stiffness of the tissue along fiber 
directions^. If this method were to be applied to model brain deformation, one 
could use a model adapted from j^. 

Deformation and Strain: Gonsider a body B{0) which after time t moves and 
deforms to body B{t). A point X on B{0) goes to a point x on B{t) and the 
transformation gradient F is defined as dx = FdX. The deformation is expressed 
in terms of the strain tensor e. Because the deformations to be estimated in this 
work are bigger than 5%, we use a finite strain formulation, the logarithmic 
strain e^, which is defined as: e = ln\/F.F' . Since the strain tensor is a 3 x 3 
symmetric 2nd-rank tensor (matrix), we can re-write it in vector form as, e = 
[fill £22 £33 £i 2 £i 3 £ 23 ]^- This will enable us to express the tensor equations in a 
more familiar matrix notation. 
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Strain Energy Function: The mechanical model can be defined in terms of a 
strain energy function. The simplest useful continuum model in solid mechanics 
is the linear elastic one which is of the form: W = e'Ce where C is a 6 x 6 matrix 
and defines the material properties of the deforming body. The left ventricle of 
the heart is specifically modeled as a transversely elastic material to account for 
the preferential stiffness in the fiber direction, using the matrix C\ 



c-^ 



1 ~^P ~^fp 

Ep Ep Ef 

~^P 1 ~^fp 

Ep Ep Ef 

Ep Ep E f 

0 0 0 

0 0 0 

0 0 0 



0 

0 

0 

2(l+i^p ) 
Ep 

0 

0 



0 0 
0 0 
0 0 
0 0 
^ 0 
0 



(3) 



where Ef is the fiber stiffness, Ep is cross-fiber stiffness and Vfp,Vp are the 
corresponding Poisson’s ratios and G/ is the shear modulus across fibers. {Gf ~ 
E f / {2(l+v fp)) .li E f = Ep and Vp = Vfp this model reduces to the more common 
isotropic linear elastic model. Alternatively a different form of W altogether 
could be used such as the one from a Rivlin-Mooney Material Model[0|. 



2.2 Landmark displacement estimation 



In our work, the original displacements on the outer surfaces of the myocardium 
were obtained by using the shape-tracking algorithm whose details where pre- 
sented in HH. We note that other displacement data, including that from mag- 
netic resonance tagging EE3, could also be used. 

The shape-tracking algorithm also produces a set of confidence measures for 
each match. We model these estimates with a Gaussian noise model and generate 
the term p{u^\u) of equation iP) to be 



P{u^\u) 



1 (— 

, e ^ 



(4) 



where is set to be the reciprocal of the confidence of the particular displace- 
ment estimate. Where no displacements estimates are available the confidence 
is set to zero. 



2.3 Solution using the Finite Element Method 

Having defined both the model p{u) and data p{u^\u) portions of the problem, 
we can now minimize equation m to find the optimal displacement field u. 
Taking logarithms and differentiating with respect to the displacement field u 
results in a system of partial differential equations, which we solve using the 
Finite Element Method^. The first step in the finite element method is the 
division or tessellation of the body of interest into elements; these are commonly 
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tetrahedral or hexahedral in shape. Once this is done, the partial differential 
equations are written down in integral form for each element, and then the 
integral of these equations over all the elements is taken to produce the final set 
of equations. For more information one is referred to standard textbooks such 
as BatheP. The final set of equations is then solved to produce the output set 
of displacements. 

3 Results 

In this section we present results from the application of this methodology to 
ten sets of cardiac MR sequences acquired from anesthetized dogs. The resulting 
3D image set consists of sixteen 2D image slices per temporal frame, and sixteen 
temporal 3D frames per cardiac cycle. First the dogs were positioned in the 
magnet for initial imaging under baseline conditions. The left anterior descending 
coronary artery was then occluded and a second set of images was acquired. 
The images were pre-segmented to extract the endo- and epi-cardial boundaries 
surfaces and interactively corrected using a platform specially developed for this 
purpose jZ). Then points on the corresponding surfaces were tracked to generate 
the input displacement data using shape-based algorithms described in m 
The myocardium was modeled as an anisotropic linear elastic material which 
was stiffer in the fiber directions P; shown in figure P The tissue was assumed 
to be 3.3 times stiffer along the fiber direction, obtained by linearization of the 
non-linear model from P, and approximately incompressible. 

For each frame between end-systole (ES) and end-diastole (ED), a two step 
problem is posed: (i) solving equation dD normally and (ii) adjusting the position 
of all points on the endo-and epi-cardial surfaces so they lie on the endo- and epi- 
cardial surfaces at the next frame using a modified nearest-neighbor technique 
and solving equation m once more using this added constraint. This ensures 
that there is no bias in the estimation of the radial strain. Figure El shows a 
contour map of radial strain (thickening) in a long-axis section of a normal left 
ventricle and in the same animal after occlusion. 



Table: Radial and Circumferential Percentage Strain Changes for Normal and 

Infarcted Regions. 



Percentage change 


Radial normal 


Radial Infarct 


Circum. Normal 


Circum. Infact 


Our Method (Average) 

SonomicroemetersP 


-16.4 % 
-f5.6 % 


-135.1% 

-150.0% 


+18.9% 

+15.4% 


+77.2% 

+73.3% 



The validation measures used were the percentage end-systolic strain change 
for the radial and circumferential components between the baseline and post- 
occlusion measurements. The normal and infarcted regions where defined by 
post-mortem measurements. These results are compared to measurements made 
by using implanted sonomicrometers, work performed by members of our re- 
search team and reported in0, which provide highly accurate strain measure- 
ments by calculating relative Doppler-based displacements, and are used as a 
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Fig. 1. Fiber direction in the left ventricle as defined in Guccione et al. 




Fig. 2. Radial Strain at end-systole in a section normal left-ventricle (left) and post- 
occlusion(right) shown in an a long-axis sectional view. Normal behavior is thickening 
(positive). Note the infarct region on the right which is in darker color 



gold standard. The results are summarized in the table and are consistent with 
the observation that in the case of infarction the tissue thins instead of thickens, 
hence there is a negative change in the radial strain and it bulges out instead of 
contracting, explaining the positive change in the circumferential strain. For a 
more detailed discussion see a related technical report |S|. 



4 Conclusions 

In this paper we have described a methodology for the estimation of deformation 
from sequences of 3D images of individual objects, using the left ventricle of the 
heart as a key example. We believe that the best approach to this problem 
involves the modeling of the mechanical properties of the object explicitly in the 
language of continuum mechanics, as this makes possible the incorporation of 
existing theoretical and experimental research in biomechanics, and it provides 
a growth path for solving more difficult problems by naturally invoking more 
sophisticated/appropriate models. In this cardiac work for example, we were 
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able to easily take advantage of knowledge of fiber orientation to create a model 
of the heart that is anisotropic and accounts for more of the actual properties of 
the tissue. In the future, we hope to use a non-linear mechanical model which 
will capture the ‘hardening’ of the tissue as it is stretched. We also note that the 
only part of this work that is specific to the left ventricle is the particular strain- 
energy function. By substituting an appropriate matrix C in the case of a linear 
elastic material or an altogether different form of W in equation @ altogether, 
this method can be used to estimate the deformation of other objects. 
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Abstract. A method for non-rigidly deforming 3D PET datasets is de- 
scribed. The method uses a Lagrangian motion held description and a for- 
ward deformation mapping. To regularize the deformation, an anisotropic 
strain energy function is used that separately models the material proper- 
ties of cardiac and background tissues. The method is applied to motion 
compensation in PET so that different time frames of a cardiac sequence 
may be combined. 



1 Introduction 

In gated acquisition of cardiac Positron Emission Tomography (PET), motion 
of the heart is stopped in the images by dividing the data obtained during each 
cardiac cycle into a number of different time frames, or gates. An unfortunate 
effect of distributing the data into many time frames is that the statistical quality 
of each reconstructed volume suffers, and the individual images appear to be very 
noisy. Ideally, one would like to correct the images for cardiac motion, then add 
them back together to obtain a composite image with less motion blur and better 
contrast to noise properties. 

We describe here a deformable motion technique that allows motion com- 
pensation for subsequent combination of PET datasets. A source volume repre- 
senting the heart at end systole will be deformed to match a reference volume 
representing the heart at end diastole. The deformed source will then be summed 
with the reference to produce a composite volume with better contrast to noise 
characteristics. Though a gated cardiac study typically results in some 10 - 15 
gates, each representing a short portion of the cardiac cycle, this paper will just 
focus on the combination of two time frames. Unique in the approach are two 
aspects. First, a non-uniform regularization constraint incorporating anisotropic 
strain energy is used to model the underlying cardiac tissue. Second, a forward 
deformation mapping is used which insures that each voxel in a source dataset 
contributes to the calculation of a deformed volume. The work is most closely 
related to 3D deformable motion work based on optical flow algorithms m and 
material elastic models m- 

2 Motion Estimation 

As is the case with most 3D deformable algorithms, this algorithm is based 
on two general criteria. An image matching constraint first attempts to find 
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a motion field that warps a source volume to best match a reference volume. 
Because numerous image matching transformations exist which equally satisfy 
the image matching constraint, the solution is regularized by imposing an ad- 
ditional criterion constraining motion field smoothness. This latter requirement 
treats the volume as a continuously stretching and bending medium that can 
only deform as is consistent with elastic material models. In our smoothness 
constraint formulation, we use a pre-segmented volume which masks the heart. 
This enables smoothing of the motion field to be carried out differently in cardiac 
tissue than is done in the adjacent tissue and blood pool. 

The motion estimation framework is described as follows. Define two 3D 
density fields, a source volume, /i(r), and a reference volume, f 2(^)1 where r = 
(x, y, z) represents the voxel index. A dense Lagrangian motion field is defined 
as m(x,y,z) = (u(x,y, z),v(x,y, z),w(x,y, z)) and the deformed volume of fi 
is defined as /(r) = /i(r - 1 - m). With these definitions, we can express an image 
matching error term, e/(r), and an anisotropic material strain energy term |S|, 
es(r), at each voxel location r, as follows: 



where 7 / is a global scalar used to alter the balance between the two error terms, 
A and fj, are elasticity terms called the Lame constants, and where derivatives of 
the motion field are denoted as = du/dx. 

It can be seen that the A term in equation (2) penalizes non-zero divergence 
and the /i term penalizes sharp discontinuities in the motion field. For highly 
incompressible fields, the Poisson ratio, v = A/(2(A -I- y)), approaches a max- 
imum of 0.5, which yields a divergence term. A, that approaches infinity. The 
Lame constants used in equation (2) are global constants for isotropic materi- 
als. Obviously, the elastic properties of the myocardium are drastically different 
from the blood pool inside the ventricle, and from the adjacent lung tissue and 
air space. In this formulation, we implement an anisotropic elastic model by us- 
ing a segmented voxel mask to delineate voxels representing cardiac tissue, and 
represent A and fj, by vector fields instead of just two global scalars. The vector 
fields for each term take on two values, one value in the region labeled cardiac 
tissue, and another value in the background regions. As such, separate elastic 
properties can be ascribed to cardiac tissue and to adjacent regions. We assume 
here that a technique is available to obtain a reasonably correct segmentation 
of the cardiac tissue from the background, though it is noted that this may not 
always be a trivial task, and may itself be a formidable research question in some 
cases. 

Though the motion field describing the volume deformation is a one-to-one 
mapping in a continuous domain, implementation in a discrete domain involves 



e/(r) = 7/(/2(r) - /W)^ 



( 1 ) 



and 




{u^ + Vy + + y{ul + v^ + wl) + 
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some subtleties that are important to recognize in the deformation of PET 
datasets. Past efforts have used a reverse transformation to calculate 

voxel values in the deformed volume. In this Eulerian formulation, the motion 
vectors describe a particle’s motion with respect to its final position. To obtain 
the value of each voxel in the deformed volume, /(r) = /i(r — m), eight voxels 
from the deformation volume are sampled at the location, r — m, and weighted 
according to trilinear interpolation. Such backward sampling does not guarantee 
that each voxel in the source volume will contribute to the deformed volume. 
We use a Lagrangian forward sampling technique which distributes each voxel 
value of the source volume using normalized Gaussian weighting in a single-pass 
calculation of the deformation. Though the forward sampling scheme does not 
guarantee absolute conservation of total voxel intensities, it does guarantee that 
every voxel in the source volume contributes to the deformation volume. Also, 
the normalized Gaussian weighting of the displaced voxels prevents artifacts in 
the non-uniformly sampled deformation. 

The overall minimization problem is to find a motion field consistent with 
elastic material properties that best matches the deformed volume to the refer- 
ence volume via a minimization of: 

Etot = Y^[ei{r) + es{r)] (3) 

r 

We invoke a minimization technique similar to the approach proposed by 
Zhou PI, which linearizes the calculation of an optimal deformed volume by 
using a Taylor series approximation. Assuming the true motion field is m, and 
the current estimate is m, then a Taylor series approximation of /(r) can be 
expressed in terms of a delta motion field, (6u,Sv,Sw) = <5m = m — m, as 
/(r) = /i(r -I- m) — V/i(r + m)(5m. Substituting the expression, m — (5m, for 
m in the constraint equations results a quadratic functional in Jm that can 
be minimized via the calculus of variations p|. The resulting Euler-Lagrange 
equations are solved using finite differencing techniques and a conjugate gradient 
method. At each step, /(r) is calculated and the conjugate gradient algorithm 
is used to find the best i5m satisfying the equations. This delta motion field 
is added to the current total motion field and the procedure is repeated. For 
the results presented in this paper, ten to fifteen iterations of this outer loop 
were typically required to reach a overall solution. Each conjugate gradient step 
usually converges quickly, and also requires some ten to twenty iterations. 



3 Results 

Two cardiac phantoms were used to test the algorithm. The first is a simple 
model of gated emission PET consisting of a ellipsoidal building blocks forming 
the human torso |Zj. The second is a finite element model (FEM) based on a 
parametric prolate spheroid description of a left ventricle which has been fitted 
to MRI data acquired from a canine heart 0. Included in the model is the 
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Fig. 1. Ellipsoidal phantom results 



incompressible nature of cardiac tissue and non-symmetric cardiac muscle fiber 
orientation. 

Figurenshows the results on the simple model. The source volume represent- 
ing end systole is seen in (a). The reference volume representing end diastole is 
seen as an edge map overlaid on (a). An attempt at deforming the source volume 
using an isotropic strain energy function penalizing non-zero divergence (Pois- 
son ratio = 0.46) shows in (b) that the non-zero divergence in the blood pool 
makes it difficult for the algorithm to find the correct deformation. Relaxing the 
divergence penalty allows a better match, seen in (c). However, the best match is 
obtained using an anisotropic strain energy function penalizing non-zero diver- 
gence and smoothness only in the cardiac tissue (d). Mean squared error (MSE) 
values between the reference volume and cases (b), (c) and (d) are 1727, 1234 
and 555 respectively. Image difference maps between the reference and cases (c) 
and (d) are shown in (e) and (f). These further demonstrate that the anisotropic 
strain energy function produces the warped volume best matching the reference. 
It is noted that in order to find a suitable deformation in case (c), the image 
weighting term needed to be double the value that was used for the anisotropic 
case. This is troublesome, since one would not like to weight the image matching 
criteria so much that physically implausible motions are estimated. 

As a display of the utility of this algorithm, (g), (h) and (i) show a com- 
parison of noisy versions of the phantom summed with and without motion 
compensation. Obviously, if no motion compensation is done, as seen in (h). 
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Fig. 2. Parametric FEM results 



then blur due to the motion is induced which severely obscures image features. 
By first deforming the systole volume to match the heart shape at end diastole, 
and then summing (i), the contrast to noise ratio is improved over the reference 
volume alone (g). This is the desired result which allows us to combine gated 
PET datasets and increase image quantification without loss of resolution. 

Results using the FEM are seen in Fig. |21 A 16 element model was used 
to determine the shape of the left ventricle as it was passively inflated. Here 
the inflated state is used as a reference volume, and the deflated state is the 
source volume. Because a parametric description of the two states is available, 
the “ground truth” motion vectors may be calculated which bring any two points 
into correspondence. The source volume and an edge map of the reference are 
seen in (a). To better visualize performance of the deformation algorithm, tex- 
ture was added to the model by giving each of the 16 elements a slightly different 
voxel value. Deformed volumes using isotropic strain (b) and anisotropic strain 
(c) look similar; both match the reference fairly well. MSE values with respect 
to the reference are 1117 and 1002 respectively, so the anisotropic model per- 
forms only slightly better with respect to this measure. Comparing motion field 
magnitudes of the isotropic (d) and anisotropic (e) results verses the true mo- 
tion field magnitude (f) reveals that the anisotropic model is considerably more 
accurate with respect to this measure. MSE values of the true magnitude vol- 
ume (f) compared to (d) and (e) are 36661 and 17481 respectively. The motion 
magnitude images point out how the isotropic strain model falters in the region 
where image divergence is present (in the blood pool). Since there was a zero 
background in this case, the motion field error in the background region does 
not induce much error in the deformed volume for the isotropic case. This would 
not be true in general for real PET data where voxel intensities in the blood 
pool would be small, yet not negligible. 
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4 Concluding Remarks 

When deforming a volume to match a reference dataset, there is always a bal- 
ance between the weight of the image matching constraints and the regularization 
constraints. Because numerous motion fields can produce identical deformed im- 
ages, it is the function of the regularization constraints to prevent physically 
unrealizable motion fields. In the deformation of real PET datasets, where con- 
siderable statistical noise is present, there is always the danger of weighting 
the image matching terms too greatly so that uncorrelated ’’hot spots” in the 
datasets are matched even though they do not originate from the same segment 
of cardiac tissue. The motivation for this work was to incorporate a more realis- 
tic, nonuniform elastic model into the regularization constraint so that this term 
could be weighted more heavily, and thus would prevent solutions with physically 
implausible motion fields. Though the technique required a prior segmentation 
step, because the segmentation was only used during the regularization process, 
and not during the final image warping calculation, the algorithm should not 
be sensitive to minor segmentation errors. The improvements shown in this pa- 
per by the anisotropic model over the isotropic strain model indicate that this 
more realistic model can be worth the added expense of the requirement for a 
segmented cardiac volume. 
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Abstract. Inspired by the discussion in neurological research about the 
callosal fiber connections with respect to brain asymmetry we developed 
a technique that measures distances between brain hemispheres in a non- 
Euclidean, curvilinear space. The technique is a generic morphometric 
tool for measuring minimal distances within and across 3-D structures. 
We applied the technique for distances from the cortical gray/white mat- 
ter boundary to the cross-section of the corpus callosum. The method 
uses a 3-D extension of the F*-algorithm. The algorithm uses a cost ma- 
trix determined by the image data. The resulting distances are mapped 
to the cortical surface and differences on the two hemispheres can be visu- 
ally compared. Distances were also projected back to the corpus callosum 
to represent asymmetry by comparing left and right measurements. We 
can present results obtained by processing 11 3-D magnetic resonance 
data sets representing a normal control group. 



1 Introduction 

Image analysis has become a common component to study diseases of the human 
body by obtaining anatomical and functional information. Since the advent of 
non-invasive magnetic resonance imaging, morphometry has become increasingly 
important. The new analysis methods described here are fully 3-D processing 
techniques and overcome limitations of conventional slice-by-slice analysis. 

This project is driven by studying schizophrenia. In schizophrenia, changes 
in the morphology of various brain structures are thought to provide important 
clues to the disease related brain abnormalities, but the changes are subtle and 
can barely be detectable with current interactive segmentation techniques m 
Quantitative measurements on postmortem brains and on anatomical structures 
segmented from magnetic resonance image data corroborate the hypothesis that 
the asymmetry between the brain hemispheres is reduced at first episodes of 
schizophrenia m To date, the errors in measurements are often larger than 
the effect to be studied, and interesting findings often could not be confirmed by 
other research groups. Therefore, it becomes necessary to provide more accurate 
measurements of brain asymmetry. Bullmore et al. Pj proposed a measurement 
called radius of gyration to assess cerebral asymmetry. This measurement has 
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been only applied to 2-D coronal slices. Prima et al. m used non-linear elastic 
registration to find corresponding regions in the two hemisphere. Differential op- 
erators applied to the deformation field result in measures of lateral asymmetry. 

Symmetry of structures under a class of spatial transformations is a well- 
defined mathematical property. However, dealing with biological structures and 
the inherent variability, the mathematical approach to exact symmetry is too 
strict and has to be modified. Guillemaud et al. |H| segmented the manifold of 
the interhemispheric fissure and determined length between the cortical surface 
and the fissure along perpendicular lines emanating from the midplane. Measures 
from the left and right cortical surface result in estimates of local asymmetries 
and in a quantitative 2-D asymmetry map. This paper also suggested the use of a 
curvilinear coordinate system of the brain directly related to brain morphology. 
The encouraging results inspired the research work presented in this paper. A 
more realistic simulation of white matter fiber connections, however, would have 
to include information about local fiber directions, as nicely presented in m and 
HH, for example. The search for minimum cost paths is an 3-D extension of the 
F* algorithm jS| and has similarities to the interactive live-wire segmentation in 
P]. In the context of analyzing the white matter structure of the brain we also 
would like to refer to Mangin et al. who proposed a discrete implementation of 
conservative flow systems to analyze the white matter, in particular to detect the 
corpus callosum. Due to lack of space, details of implementations are generally 
omitted here, but are described in US] (full color version). 



2 Optimal Path Algorithm and Asymmetry Measurement 

In our proposed approach, callosal fibers are simulated by curvilinear paths of 
minimal distance running inside the white matter from the white matter bound- 
ary to the interhemispheric cut through the corpus callosum. We use distance 
measurements propagated along trajectories determined by the graph search 
algorithm F*, extended to fit our specific needs. The distances at the white mat- 
ter boundary are projected back onto the corpus callosum for a comparison of 
asymmetry between the two hemispheres. 

The F*-algorithm used in our implementation is based on the approach of 
Tenenbaum jS|, where pixels or voxels of a dataset are represented by nodes 
of a graph. The edges of the graphs are defined as the 8-neighborhood in 2-D 
space and as the 26-neighborhood in 3-D space. The F*-algorithm enables the 
calculation of a distance map from a certain point of reference (‘seed’) to any 
other point in the graph. This distance map assigns a distance- value to each node 
in the graph which is based on a cost function that determines the point-related 
cost of a path. 

To fit our needs we have implemented several extensions of the original F* 
algorithm: 1) extension to 3-D space, 2) use of a seed region instead of a single 
seed point to allow multiple seed regions (see Fig.OJ, 3) calculation of accurate 
costs for paths running along diagonals, but each dimension needs additional 
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(a) (b) (c) (d) 

Fig. 1. 2-D F* distance maps representing the distance of the sea route to the closest 
seed point: (a) single seed point (at Barcelona) with highlighted optimal path running 
to Stockholm, (b) multiple seed points, (c) seed region (border of Ireland), (d) single 
seed, cost matrix penalizing optimal paths running far off the coast 




Fig. 2. Visualization of arbitrary optimal paths based on a constant cost matrix (b) 
and based on an a-posteriori probability cost matrix (c) on an 2D MRI image (a) 



correction if voxel dimensions are non uniform, 4) propagation of additional 
information and measurements from the seed region to all points. 

The F* algorithm needs the cost function to be stored as a matrix which 
represents the point-related costs for each point. The cost matrix was modeled 
to force optimal paths to run less likely through certain regions using two terms 
a constant distance term and a penalty-term. The penalty term assigns high 
costs to points where paths should be less likely to run through (see Fig.GJ- The 
resulting path lengths are not measured in unit size, requiring a modification of 
the F* algorithm to additionally calculate the unit size distances. 

The optimal path is not an explicit result of the F* algorithm, but they 
are extracted from the distance map using a steepest descent approach to trace 
trajectories back to the seed points (see Figs. E]and|3). 

So far, we have calculated distances at the white matter boundary. However 
such a visualization is rather unusual and requires training. More common is a 
projection of attributes to the cortical surface, which also allows a comparison 
between multiple brain surfaces. We have developed a method to project the 
calculated distances from the gray/ white matter boundary outwards to the cor- 
tex through gray matter using the F* algorithm. The projection runs along the 
optimal path from the white matter boundary to the cortex (see Fig. 0. 

The main problem in defining asymmetry measurements is to determine cor- 
respondence. Establishing correspondence between brain hemispheres is not well 
defined since the brain is not strictly symmetric and depicts structures which 
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Fig. 3. Application on a 3-D brain atlas. Visualization of calculated distances on the 
white matter boundary (a) and as projection to the cortex (b). Visualization (c) of 
arbitrary paths and corpus callosum 



appear only in one hemisphere. The approach chosen in our application is to com- 
pare the distances projected back to the corpus callosum along optimal paths. 
Measurements projected from both sides to one point of the corpus callosum can 
therefore be compared directly for asymmetry. 

We define our asymmetry measurement as the difference of the mean of the 
distances after averaging the distance values separately for each side. These 
differences can be visualized as a 2-D difference graph or can be projected back 
to the cortex for visualization. 

3 Results 

The proposed algorithm has been applied on 2-D datasets without symmetry 
axis like maps (see Fig. P) and mazes to test and extend the functionality of 
the F* algorithm. Further 2-D tests involved datasets with symmetry axis at 
the seed region, like artificial images, images of butterflies, bats, plants and 2-D- 
slices of a brain atlas. The mean distance asymmetry measure was shown to be 
superior to extrema or median measures. Corresponding catchment areas on the 
symmetry axis showed high variability in cases when areas were hidden behind 
obstacles. In such cases the asymmetry measurement turned out to be poor. 

The first 3-D test has been performed on an isotropic brain atlas. Distances 
and paths were calculated and visualized (Figs. 0a-c), 0. There were significant 
visual differences observed between the two hemispheres. The difference graph 
of the mean was determined from asymmetry measurement and visualized (Fig. 
0. Both the asymmetry graph and the distance visualization on the cortex 
demonstrated that the left hemispheric paths were longer for most parts of the 
brain. Compared to 2-D, we observed a lower variance of the size of catchment 
areas, but the correspondence was still not solved to our satisfaction. One reason 
is that the corpus callosum is small compared to the the white matter, so rather 
large areas are projected onto a single point on the corpus callosum. 

As further 3-D tests, datasets of 10 control patients of an Organic Amnesia 
study, varying in age and sex, have been processed. Both the corpus callosum 
and the brain hemispheres were segmented manually. The segmentation of the 
brain tissues has been performed using statistical classification with the Bayes- 
classifier. The a posteriori probabilities were used to calculate the cost matrix. 
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0 mm 65 mm 0 1 -1.0 mm 3 mm 

Fig. 4. Application on a 3-D brain atlas. Visualization of distances (a) and of nor- 
malized asymmetry measurement (b) as a projection on the cortex. Areas of smaller 
distance are displayed in blue, (c) Visualization of the difference graph (left minus 
right) projected on the corpus callosum 




42 mm 65 mm 37 mm 84 mm 

Fig. 5. Application on real 3-D datasets: Visualization of distances projected on the 
cortex from superior (a) and inferior (b) viewpoints. Visualization of the correspon- 
dence as projection on the cortex (c) and of the color-coded labels on a slice (d) 



Distances have been visualized (see Fig. 0 and there were again significant vi- 
sual differences between the two hemispheres in all datasets. The asymmetry 
measurements have not yet been calculated. We observed that manual segmen- 
tation of the corpus callosum is rather poor resulting in displacements from the 
interhemispheric fissure as large as a few millimeters. These displacements are 
of equal size as the mean differences of the distances for the atlas. 

4 Conclusions and Discussion 

In this paper, we have presented a new approach to measure minimum cost 
paths in a non-Euclidean curvilinear space. We use such paths as a simulation 
of callosal white matter fiber tracts which are of interest in current neurological 
research. We also proposed a technique to calculate a rough correspondence and 
an associated asymmetry measurement. Results are promising, but the corre- 
spondence especially needs improvement. The method has been applied to 11 
3-D datasets so far, and the implementation is stable and reliable. 

The distances determined with our method are based on the city-block met- 
ric with the inherent disadvantages of showing large deviations from Euclidean 
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distance measurements and of non-isotropic propagation of distances in space. 
Kiryati et al. have addressed this issue and have proposed a correction of the 
calculated distances. We plan to incorporate this correction in a future method. 
Future directions of our research include the generation of a more robust mea- 
surement of asymmetry, combining curvilinear distances with explicitly estab- 
lished lateral correspondence between brain hemispheres. 

Acknowledgments 

This work was supported by the EC-funded BIOMORPH project 95-0845, a 
collaboration between the Universities of Kent, Oxford, ETH Zurich, INRIA 
Sophia Antipolis and KU Leuven. Project funding is provided by the Swiss 
Federal Office for Education and Science (BBW Nr 95.0340). 

References 

1. Shenton, M.E.: Psychopathology: The Evolving Science of Mental disorders, chap- 
ter Temporal lobe structural abnormalities in schizophrenia: A selective review and 
presentation of new MRI findings. Cambridge University Press (1996) 

2. Shenton, M.E.: Brain Imaging in Clinical Psychiatry chapter MRI studies in 
schizophrenia. Marcel Decker Inc. (1997) 

3. Bullmore, E., Brammer, M., Harvey, Murray, R., Ron, M.: Cerebral hemispheric 
asymmetry revisited. Psych. Medicine, 25 (1995) 249-363 

4. Crow, T., Colter, N., Frith, C., Johnstone, E., Owens, D.: Developmental arrest of 
cerebral asymmetries in early onset schizophrenia. Psych. Res. 29 (1989) 247-253 

5. Fischler, M., Tenenbaum, J., Wolf, H.: Detection of roads and linear structures in 
low-resolution aerial imagery using a multisource knowledge integration technique. 
Computer Graphics and Image Processing 15 (1981) 201-223 

6. Kiryati, N., Szekely,G.: Estimating shortest paths and minimal distances on digi- 
tized three-dimensional surfaces. Pattern Recognition (1993) 

7. Mangin, J., Regis, J., Frouin, V.: Shape bottlenecks and conservative flow systems. 
Proc. MMBIA (1996) 319-328 

8. Marais, P.G., Guillemaud, R., Sakuma, M., Feldmar, J., Crow, T., Zisserman, A., 
Brady, M.: Visualising cerebral asymmetry (1996) 1131 411-416, 1996 

9. Mortensen, E.N., Barret, W.A.: Fast, accurate, and reproducible live-wire bound- 
ary extraction. Proc. Visualization in Biomedical Computing (1996), 183-192, 1996 

10. Peled, S., Gudbjartsson, H., Westin, C., Kikinis, R., Jolesz, F.A.: Magnetic reso- 
nance diffusion tensor imaging demonstrates direction and asymmetry of human 
white matter fiber tracts. Brain research (1998) 780 27-33 

11. Poupon, C., Mangin, J., Frouin, V., Regis, J., Poupon, F., Pachot-Clouard, M., 
Bihan, D.L., Bloch, L: Regularization of mr diffusion tensor maps for tracking brain 
white matter bundles. Proc. Medical Image Computing and Computer-Assisted 
Intervention (1998) 489-498 

12. Prima, S., Thirion, J., Subsol, G., Roberts, N.: Automatic analysis of normal dis- 
symmetry of males and females in mr images. Proc. Medical Image Gomputing 
and Computer- Assisted Intervention (1998) 770-779 

13. Styner, M., Coradi, T., Gerig, G.: Brain morphometry by distance measurement 
in a non-euclidean, curvilinear space. Tech, report - UNC-CS Department (1999) 



Learning Shape Models from Examples Using 
Automatic Shape Clustering and Procrustes 

Analysis 



Nicolae Duta^, Milan Sonka^, and Anil K. Jain^ 

^ Department of Computer Science and Engineering, Michigan State University, USA 
^ Department of Electrical and Computer Engineering, The University of Iowa, USA, 

dutanicoOcse .msu. edu 



Abstract. A new fully automated shape learning method is presented. 
It is based on clustering a shape training set in the original shape space 
and performing a Procrustes analysis on each cluster to obtain a cluster 
prototype and information about shape variation. As a direct applica- 
tion of our shape learning method, a 17-structure shape model of brain 
substructures was computed from MR image data, an eigen-shape model 
was automatically derived. Our approach can serve as an automated sub- 
stitute to the tedious and time-consuming manual shape analysis Q 



1 Motivation 

Automated learning of shape models has direct implications in medical image 
interpretation. We and others have previously demonstrated the utility of incor- 
porating shape in medical image segmentation and interpretation fP . However, 
training a shape-based segmentation system is mostly done manually follow- 
ing a tedious and therefore impractical process. We report a novel approach to 
automated learning of shape models from examples and demonstrate its utility. 

We have developed a novel solution to the problem of shape reparameteriza- 
tion-alignment-averaging problem. The main difference from previously reported 
methods m is that the training set is first automatically clustered and those 
shapes considered to be outliers are discarded. The second difference is in the 
manner in which registered sets of points are extracted from each shape contour. 



2 Background and Notation 



A shape instance A = {sf}i=i,,n = {{xf ,yf)}i=i..n is a set of points in the 
2-D Euclidean space. A shape instance B is called aligned to a shape instance 



A if the sum of squares SS{A, B) 




-xf) 



{vi 




cannot be 



i—1 ^ 

decreased by scaling, rotating or translating B. In this case SS{A,B) is called 
Procrustes sum of squares PSS{A, B). 



^ See http://web.cse.msu.edu/~dutanico for a complete paper and a set of results. 
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The Procrustes average shape of a set of shapes is a shape instance 

near the center of the empirical distribution of Ak’s in the shape space. For a 
detailed definition, properties and ways of computing an average shape see j2j ■ 
Let A = {{xf,yf)}j=i,,p and B = {(x^ ,yj^)}k=i..r be two shape instances. 
A match matrix M = is defined by: 

. , _ J 1, if point Oj corresponds to point bk, 

0, otherwise. 

We consider 0-1 match matrices M corresponding to symmetric one-to-one 
links (point correspondences); that is, a point aj G A can have at most one 
corresponding point bk € B, in which case the correspondence is symmetric. 
The points from both sets that have no correspondence are called outliers. 
Let Am and Bm be the subsets of A and B matched by M and PSS{M) = 
PSS{Am, Bm). We define a search criterion to be minimized over the match 
matrices space as: f{M) = [PSS{M) /n + K]/n, where n is the number of links 
in M and K is a constant. This functional encodes the fact that we are will- 
ing to trade a q% increase in average PSS for a p% increase in the number of 
correspondences. It also helps avoid the shrinking effect described in Q. 

3 Problem Definition and Solution Outline 

Mathematically speaking, we present a solution to the following problem: Given 
a set of m shape instances Sk = { {x^ , , partition it into a set of clusters 

and, for each shape cluster, compute a prototype (Procrustes mean shape). The 
set of shape prototypes will be used as models for detection of object instances in 
new images by means of deformable template segmentation. Our shape learning 
method consists of the following main steps: 

Algorithm 1: Shape Learning Outline 



1. For each (evenly sampled) shape Sk in the training set compute a polygonal 
approximation S'f.. 

2. For each j, k = l..m perform a flexible one-to-one registration (mapping) of 
«S'(, to Sj. If the registration succeeds, define a set Tj^k as the subset of Sj 
that corresponds (was matched) to the points of S'(., otherwise set Tj^k = 0- 

3. Compute a pseudo-distance matrix V = {dj^k}j,k=i..m 

where dj^k = S'ff)/\Tj^k\ if yf 0 or dj^k = oo otherwise. 

4. Set the current training set equal to the original set of m shapes: 

CTS = {Sk}k=i..m- While CTS do 

(a) Find the shape approximation S[^ that has the least average distance to 
the shapes Sj G CTS (the best fit shape to the current training set). 

(b) Extract from CTS and put in a cluster all the shapes Sij^,..,Si^ to which 

can be fit < oo). 

(c) The cluster prototype is defined as the Procrustes average of Tij^j^,..., 
Ti^jg. The shape variance inside the cluster is defined as the covariance 
matrix of the aligned set {Tif, ig}k=i..p. 
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The shape approximations computed in Step 1 of the learning algorithm have 
about three times fewer points than the original shapes in order to smooth small 
shape artifacts, noise and quantitation effects and are only used to extract sub- 
sets of corresponding points from the original shapes, providing an easier tash for 
the registration algorithm and implicitly bringing together the extracted subsets 
into a common parameterization frame. Indeed, if a point Si^ on a polygonal 
approximation S' is registered to G Si, Si^ G S 2 , •••, Si^ G Sm {Si,...,Sm 
are original shapes that form a cluster), then by transitivity, are 

correspondents on S\, Sm of one vertex oi an average shape. This also ensures 
that the shape variation present in the original data is completely preserved if 
the registration process is precise. 

The employed shape registration method consists of two stages: (i) Similarity 
registration of two arbitrary sets of points and (ii) Non-linear registration based 
on local similarity of two curves: 

Algorithm 2 (Global similarity registration) 

1. Set Vmin — 00 . 

2. For every pair of points (aji,aj 2 ) G A x A 

For every pair of points {bki,bk 2 ) G B x B do steps (a) through (e) 

(a) Find the similarity transformation if that aligns the sets {aji,aj 2 } and 

(b) Apply if to all the points in B to obtain B'. 

(c) For every point bk of B', find its nearest neighbor NN{bk) in A. If the 
distance between bk and NN{bk) is smaller than a threshold T (auto- 
matically set equal to 10% of the scale of B) then set a correspondence 
between the two. A match matrix M between A and B is constructed in 
this way. Since two points from B' can have the same nearest neighbor 
in A, we enforce on M a one-to-one correspondence requirement. That 
is, allow a point to be linked to its second to fifth nearest neighbor if the 
first one can be assigned to a closer point in B' , and the length of the 
link does not exceed T. 

(d) Compute f{M). 

(e) If /(M) < Vmin then Vmin = f{M), Ipmin = if- 

3. Apply if min to all the points in B to obtain B' . 

4. For every point bk of B' , find its nearest neighbor NN{bk) in A. If the dis- 
tance between bk and NN{bk) is smaller than T then set the correspondence 

between the two. A match matrix M' between A and B is constructed in 

this way and enforced to correspond to one-to-one links. 

5. Find the linear transformation if final that aligns the sets Am> and Bm'- 

We are interested not only in computing an average shape (which is robust 
to slight misregistrations) but also the shape variation present in the data set 
which is best described by the set of high curvature points. Since a global linear 
registration does not necessarily perform a good local registration (see 0), we 
need to locally refine the results of the global registration such that corresponding 
points of high curvature from the two data sets are matched together. However, 
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some high-curvature points in A may not correspond to high curvature points 
in B, therefore we do not enforce this requirement explicitly, but rather through 
local similarity registration and monotonicity. We define the term “local” in a 
topological sense according to the natural point ordering along curves A and B. 
A good registration should be monotonic, that is, preserve the topologies (point 
ordering) on the two shapes. 

Algorithm 3 (Monotonic, local similarity-based registration) 



Input: two sets of points A and B and a set Ai of one-to-one links between some 
subset A' of A and a subset B' of B obtained by global similarity registration. 

1. Cyclically reorder the points of A, B and the links in Ai such that point ai 
corresponds to point bi. 

2. If the number of inversions (pairs of points Oi and aj corresponding to bk 
and bi -in this order- such that i < j and k > 1) exceeds |AI|/2, reverse the 
ordering of the points in A. 

3. Break the smallest number of links in Ai such that there are no more inver- 
sions. (Note that we are left with a monotonic registration). 

4. For i = l..|i?| do 

(a) Find a topological neighborhood of bi, [&/, &/+i, ..., bi, ...br-i,br] (the ac- 
tual size of the neighborhood depends on the curvature at bi, the larger 
the curvature the smaller the neighborhood) such that both bi and br 
have correspondences in A, let them be an and a^' with V < r' . 

(b) Perform a similarity registration between the sets [ai> , ai>+i, ar>] and 

[bi,bi+i,..,br]. 

(c) If bi is linked to a different point in A than it was before, then record 
this change in Ai. 

5. Break the smallest number of links in Ai such that there are no more inver- 
sions. 

The third step of Algorithm 1 defines a pseudo-distance matrix T> of nor- 
malized Procrustes sum of squares between an approximation of a shape and an 
original shape from the training set. A convenient way for obtaining shape clus- 
ters based on T> and at the same time helpful for cluster prototype computation 
is a k-means type clustering algorithm: 

1. Find a seed which is closest to the data. This is done in Step 4a of Algo- 
rithm 1 by finding the shape approximation that best fits the current training 
set (based on the average distance to the rest of the shapes). is going to be 
used as a common ground for extracting corresponding sets of points of the same 
size from as many training shapes as possible. 

2. Extract from the training set and put in a cluster all shapes Sj that fit to 
S'ia (Step 4b). 

This cluster extraction procedure continues until all shapes from the training 
set have been assigned to a cluster. For each cluster, the cluster prototype is the 
Procrustes Average of the subsets of registered points extracted from each shape 
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in the cluster. The cluster variation is defined as the 2n x 2n covariance matrix of 
the subsets of points used to compute the prototype {n is the number of cluster 
prototype points). This variation is used by the segmentation method to reject 
shape deformations that have not been seen in the training set 
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Fig. 1. The 17 nenroanatomical structures of interest (a). Procrustes average of 25 
right-ventricle shapes (b) and 28 right-globus pallidus shapes (c) with the scatter of 
fits overlaid. The fits of consecutive points are drawn in different shades of gray to 
show the accuracy of the registration: consecutive clouds are non-overlapping 





Fig. 2. A set of 11 cistern training shapes from different patients was automatically 
divided into clusters (main cluster (Cl) and three secondary clusters). The registration 
of the best fit shape (1179) to cluster Cl is overlaid 



4 Experimental Results 

The shape learning method presented above was employed to design a shape 
model for 17 brain structures (shown in Fig.[3i) and its performance was assessed 
by a quantitative comparison to a manually-identified independent standard. The 
training set consisted of observer-defined contours identified by a neuroanatomist 
in 28 individual Tl-weighted contiguous MR images of the human brain. Figure El 
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shows the original manual tracings and clustering results for cistern together 
with the best fit shape registration to the main cluster (the sets 
as defined in Algorithm 1). Figures ^b) and (c) show the Procrustes averages 
for the right ventricle and globus pallidus with the seatter of fits overlaid. 

In order to obtain a quantitative validation of our results we used the method 
employed in . From each shape model, we manually selected several points that 
were considered most important in defining its shape (the points with the highest 
curvature) and we manually registered them to the training images. We defined 
the ground truth position of these points as the Procrustes average of the manu- 
ally registered points. We computed and compared the root-mean-square (rms) 
distance of manually placed points from the independent standard and the rms 
distance of the automatically registered points from the independent standard, 
respectively. The rms distances for the right ventricle and globus-pallidus are 
also shown in Figs. [D)b) and (c): for every point selected on each shape, each 
distance is displayed on the same y coordinate as the ground truth point it corre- 
sponds to. As a rule, the very high curvature points (the extreme upper or lower 
points) are somewhat better registered manually while the intermediate points 
are better placed automatically. This was expected, since it is very difficult for 
a human to exactly place a point if there are no curvature or other anatomical 
cues. On average, all rms errors are between 0.7 — 1.5 pixels. 

5 Conclusion 

A new fully automated shape learning method was presented. It is based on 
clustering a shape training set in the original shape space and performing a 
Procrustes analysis on each cluster to obtain a cluster prototype and information 
about shape variation. A quantitative analysis of our shape registration approach 
demonstrated results well comparable to those obtained by manual registration; 
achieving an average rms error of about 1 pixel. Our approach can serve as 
a fully valid automated substitute to the tedious and time-consuming manual 
shape analysis. 
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Abstract. We describe a method of pairwise 3D surface correspondence 
for the automated generation of landmarks on a set of examples from 
a class of shape. We show how the pairwise corresponder can be nsed 
in an extension of an existing framework for establishing dense corre- 
spondences between a set of training examples to bnild a 3D statistical 
model. The framework relies upon additional algorithms for the prodnc- 
tion of surface paths between vertices on a polyhedral mesh, and these 
are described. An example statistical model is shown for the left lateral 
ventricle of the brain. 



1 Introduction 

We describe a framework and a set of algorithms which may be used for the 
automated landmarking of a class of shapes in 3D. These landmarked shapes 
constitute a set of training examples which may be used to construct a flexible 
template model, an Active Shape Model (ASM) A previous publication |2| 
has described possible solutions to parts of the problem of automatic 3D model 
building. Here we describe a completely automated approach which involves 
extending the previous work and improving the accuracy and robustness of some 
of the algorithms. 

Currently, the construction of an ASM involves the manual identification of 
a set of L landmarks {x^; 1 < i < L} for each of N training examples of a class 
of shapes. Manual definition of landmarks on a shape has proved to be both 
time-consuming and subjective. Hill at al have previously described a method 
of non-rigid correspondence in 2D between a pair of closed, pixellated bound- 
aries PI • This pair-wise corresponder was used within a framework for automatic 
landmark generation. A similar framework is the basis of the approach to 3D 
automatic landmark generation described here, and consists of the the construc- 
tion of a binary tree of merged shapes. Once such a tree has been produced, a 
set of Lt landmark points may be identified on the root (mean) shape of the tree 
and the positions of these landmarks propagated out to the leaf (example) 
shapes. 
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2 Background 

Kambhamettu and Goldgof and Benayoun et al. P| both propose methods 
of surface correspondence based on the minimisation of a cost function which 
involves the difference in the curvature of the surfaces. As pointed out by Tagare 
et al. Pj, curvature is a rigid invariant of shape and its applicability to general 
non-rigid correspondence is problematic. 

Christensen et al. P| propose a method of non-rigid registration by fluid de- 
formation for the matching of brain anatomy in 3D. However, this technique is 
computationally expensive. Szekely et al. |5| parameterise surfaces by a heat dif- 
fusion model and further optimisation. Correspondence may then be established 
between surfaces but relies upon the choice of an origin position on each surface 
mapping and registration of the coordinate systems of these mappings by the 
computation of a rotation. 



3 Polyhedral — Based Correspondence 

The pair-wise correspondence algorithm comprises two stages: 

1. Generation of sparse polyhedral approximations A" and B" of the input 
shapes A and B by triangle decimation, for which {A"} C {A^} and {B"} C 

{BJ. 

2. Generation of a corresponding pair of sparse polyhedra A' and Bb This is 
accomplished using a global Euclidean measure of similarity between both 
the sparse polyhedron A" and a subset of labelled vertices from B and 
between B" and a subset from A. 

The sparse polygon generation algorithm makes use of a decimation method 
described by Schroeder et al. 0. However, we use a distance metric which pre- 
serves sharp edges and thin structures. The distance metric, D, is computed 
using Schroeder’s distance to mean plane measure as: 

D(vo) = |d(vo) - d'(vo)| (1) 

where d(vo) and d'(vo) are the signed distances of the vertex Vq to the mean 
plane of the triangle loop before and after decimation i.e. d(vo) = u • (vq — x), 
see Fig. Q1 

We have used a symmetric version of the Iterative Glosest Point (IGP) algo- 
rithm to establish correspondences of the sparse pointset {A"} with the dense 
pointset {B^} and of the sparse pointset {B"} with the dense pointset {A^}. Var- 
ious metrics can be used to define the closest distance between point pairs. We 
weight the squared distance between points jX— p by a factor of 2/(fixi -nY^ ) 
where nx^ is the unit surface normal on X at point i. This encourages the cor- 
respondence of points on the surfaces which are topographically equivalent. We 
label the closest points to A" from B as the pointset {B'}, and the closest points 
to B" from A as the pointset {A'}. 
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Fig. 1. Result of applying the decimation algorithm to a triangulated surface of the left 
ventricle of the brain. On the left is a shaded representation of the original dense trian- 
gulation with approximately 2000 vertices. On the right the same surface represented 
by 200 vertices (decimated by 90%) 



A single corresponding pair of sparse polyhedra must be established from 
the two polyhedron/pointset pairs, (A",{B-}) and (B",{A'}). We choose the 
connective description which produces the lowest error in representation, Ar, 
of the sparse decimated polyhedron of each shape by the sparse reconstructed 
corresponding polyhedron of that shape, where 

nA'/ . 

El= ^ min |Q(A") - Q(A')P + ^ min |Q-i(B") - 

( 2 ) 

The reconstruction is produced by combining the connectivity description of A." 
or B" with the pointset {B'} or {A'} to produce a pair of matching polyhedra 
with a one-to-one mapping (A' i— > BQ. 



4 Merging Shapes 

Given a pair of corresponding sparse polyhedra A' and B' , a local surface 
parameterisation is used to interpolate a dense set of vertices on each. The local 
surface parameterisation is of a single sparse triangle, and is produced by a 
parameterisation of the three surface paths corresponding to its three edges. 

A ‘brushfire’ type distance transform algorithm is used to march the path 
across the surface between dense edges of the triangulation. At each stage, the 
minimisation of a cost Ci(yo) locates the best next point, x,, for the surface path 
on a dense triangle edge {y^; 1 < i < 4} attached to yo, see Fig. |21 We consider 
not just the path (a, b) which is a sparse polyhedral edge, but also the dense 
polyhedral triangles ti and t 2 connected to the dense edge under consideration, 
see Fig. 0 

We construct a plane normal to the surface defined by the reference point 
c = (a -b b)/2 and by the unit normal be, where be • (Aibi -b A 2 b 2 ) = 0, in 
which Ai and A 2 are the areas of the triangles ti and t 2 respectively, and bi 
and b 2 are the unit normals to these triangles. The cost function of igniting an 
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Fig. 2. Surface paths are defined on the 
dense triangulation of the surface by the 
parameterisation of triangle edges 



dense 




Fig. 3. The cost function used to pro- 
duce surface paths is defined in terms 
of a pair of dense polyhedral triangles 
connected to the dense edge under con- 
sideration 



edge Yi from edge yo is defined for the intersection, x^, of the line segment of a 
dense triangle edge with the plane: 

Q(yo) = (|a-x'p+|b-x'n/|a-bp (3) 

where x' is the projection of x^ on the sparse edge (a — b). This cost constrains 
the surface path to lie within the line defined by a and b, thus preventing it 
from looping back around the entire surface of the shape. 

The connectivity of A! and B' are identical. Therefore, we can correspond 
the individual sparse triangles of the polyhedra. These sparse traingles are split 
recursively using surface paths to some depth to produce the dense triangula- 
tion Aj and B^. Now a densely triangulated mean shape may be generated by 
averaging the geometric information of these dense triangulations to produce a 
pointset {Ci} and this is combined with the connectivity from to produce a 
densely triangulated polyhedron C. 



5 Automated Landmarking 

The pairwise corresponder described above is used to build a binary tree of 
merged shapes with a single mean shape at the root and the examples from the 
training set at the leaves. We produce a set of landmarks {Cj d C {Cj} on the 
mean shape. The connectivity of these points is defined by the sparse polyhedron 
Cl. These landmark points are then propagated down the branches of the tree. 

At each branch of the tree, each of the landmark points can be projected 
onto a triangle of the sparse version of the mean shape C' which is the mean 
of A' and B'. The sparse triangle is then parameterised along a baseline and 
a vector between the baseline and opposite vertex, see Fig. 0 The projection 
e on the triangle (a, c,b) is now uniquely defined by the parameter pair (t,u). 
There is a correspondence between the vertices of this sparse triangle (a, c, b) 
on C' and the vertices of a pair of sparse triangles on A' and B'. Call the sparse 
corresponding vertices on A', (a',c',b'). The projection point e can therefore 
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be mapped onto the sparse triangle (a',c',b'), by parameterising it in t at u to 
give e'. 

We must now reconstruct a point on the surface of A which corresponds to 
the projection point e' mapped onto a sparse triangle. We do this by constructing 
surface paths using the method of sectional again see Fig. 0J Finally we choose 
the landmark on the dense surface as the vertex with smallest Euclidean distance 
to the reconstruction of e'. 




Fig. 4. Projected points are 
reconstructed on dense sur- 
faces by the parameterisation 
of surface paths constructed 
across the baseline and from 
the opposite vertex of a sparse 
triangle 



1 2 mean 3 4 

Fig. 5. A group of four left brain ventricle exam- 
ples and their densely triangulated mean at the third 
level of the tree of merged pairs used to generate a 
set of eight landmarked examples 



6 Results 

We have generated a 3D statistical model from eight complex biological shapes 
- left ventricles of the brain. These have been defined by hand as contours on a 
series of 2D slices from 3D Magnetic Resonance images. A grouping of four of 
the eight examples and their mean at the third level of the tree of merged shapes 
are illustrated in Fig. 0 The example shapes consisted of « 2000 vertices, upon 
which were placed 200 landmark points. The first two modes of variation of this 
model are illustrated in Fig. 0, bi explains 43 % of the total variation, and 62 
explains 16 %. 

7 Conclusions 

We have presented a novel method for the correspondence of two faceted (trian- 
gulated) surfaces. The method is based on the production of a sparse polyhedral 
representation of one shape and matching this to a sparse pointset representation 
of the other. No curvature estimation of either surface is required. The only con- 
trol parameter of the algorithm, the target number of vertices during decimation. 
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Fig. 6. Shape instances generated using a 3D PDM of eight left brain ventricles showing 
the number of s.d.s (-1.0 to -1-1.0) from the mean shape. The model consists of 200 points 



is not critical and be automated at the cost of decimating each surface twice. 
The use of this algorithm to produce of a binary tree of merged shapes, and the 
method we describe for accurate propagation of landmarks from the root to the 
leaf shapes of the tree, provides a framework for the automated landmarking of 
the input example shapes necessary for the production of a 3D statistical model. 
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Abstract. The goal of this work is to develop an approach to shape 
representation and classification that will allow us to detect and quantify 
differences in shape of anatomical structures due to various disorders. 
We used a robust version of skeletons for feature extraction and linear 
discriminant analysis (the Fisher linear discriminant and the linear Sup- 
port Vectors method) for classification. We propose a way to map the 
classification results back into the image domain, interpreting shape dif- 
ferences as a deformation required to bring a shape from one class to the 
other. An example of analyzing corpus callosum shape in schizophrenia 
is reported, as well as the results of the study of the statistical properties 
of the classifier using cross validation techniques. 



1 Introduction 

Our goal is to build a framework for statistical shape analysis using classification 
techniques applied to feature descriptors. We perform shape feature extraction 
using skeletons. To make the process of skeleton extraction robust to noise and 
quantization effects of segmentation, we have developed a new variation of the 
traditional skeletons: fixed topology skeletons. 

In this paper, we limit ourselves to linear discriminant analysis, comparing 
performance of two different linear classification methods: the Fisher linear dis- 
criminant and the linear Support Vectors methods. Then we present the shape 
differences between the groups by constructing the shape deformation in the 
image space that corresponds to the discriminant vector in the feature space. 

We tested the approach on corpus callosum data for schizophrenia patients. 
The results are reported in Sect. 0 

Related Work. Statistical shape modeling combines shape representation with 
statistical information on how the features vary across population. Principal 
Gomponent Analysis (PGA) has been used by several authors for capturing sta- 
tistical properties of the model |2I0|. It was well suited for applications in seg- 
mentation and object localization, where the statistical properties of the model 
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were used to restrict the space of possible deformations of the model. It has 
also been used in shape analysis |4I8| to reduce the dimensionality of the model 
and find a decision boundary between the classes. Bookstein j5] used the shape 
features to align the outlines, but then the features (points along the outline) 
were analyzed independently of each other. We attempt to use traditional clas- 
sification methods directly (without going through the dimensionality reduction 
step ) to find the decision boundary. 

We use a novel approach to robust skeleton estimation for feature extraction. 
Skeletons have been introduced in general computer vision several decades ago 
and have been used extensively for object recognition and localization. In medical 
image analysis, a scale-space variation of skeletons was introduced and used in 
various applications by Pizer and colleagues 0 . 

2 Shape Representation: Fixed Topology Skeletons 

Skeletons provide a compact, intuitive representation of a shape that can be 
used for segmentation, tracking, object recognition, etc. Their major drawback 
is their high sensitivity to noise in the boundary. There have been proposed 
many ways to stabilize the skeleton extraction, most of which concentrated on 
heuristics for pruning the original, noisy skeleton. 

For shape analysis of anatomical structures, the general shape of the object 
is well known ahead of time and the deformations of interest are very small and 
do not change the global shape of the structure. Fixed topology skeletons take 
advantage of this fact: we fix the structure of the skeleton graph (the skeleton 
topology) and optimize for the accuracy of the original shape representation over 
all skeletons of that fixed structure. 

Skeleton extraction. For computing the fixed topology skeleton of a shape, 
we use a distance map, a function that for every point in the image is equal to 
the distance from the point to the closest point on the boundary of the object. 
It can be shown that the skeleton is the set of ridge points of the distance map. 

We use a snake-like approach for computing the fixed topology skeleton of a 
shape. The set of skeleton points defines a continuous curve that represents the 
skeleton. We initialize the snake at the end-points of the traditionally defined 
skeleton m, and then use the distance map gradient to ’’drive” the snake. 
Additional regularization is required to keep the curve smooth. Formally, the 
update rule is 



x‘+i = a(x* + Vi^(x*)), 

where x* is the set of point coordinates on the curve at time t, VZ? is the gradient 
of the distance map computed at the locations corresponding to the points of 
the curve, and a is the smoothing operator. The curve has to be resampled every 
few iterations to maintain uniform distribution of the points along the curve. We 
stop the iterations when the curve starts oscillating around the ridge. 
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Fig. 1. Skeleton extraction, (a) and (b) show the distance map (darker color cor- 
responds to higher distance from the boundary) and the skeleton extracted for two 
different cases from the data set; (c) features used for classification: curvature angle 
and shape width 



To find the best skeleton, we estimate skeletons for different initial pairs of 
points and chose the one that describes the shape the best (0 contains more 
details on the algorithm). Figs. CJa) and (b) show corpus callosum skeletons 
computed for two different cases in our data set. 

Feature extraction. Once the skeleton is computed, we sample the skeleton 
curve uniformly by arc length and measure two values at every sample point 
(Fig. lit) : the angle between two adjacent segments in the sampled skeleton and 
the shape width at the sample point. These two features are invariant under rigid 
transformations and are therefore well suited for shape description. The number 
of sampling points on the skeleton determines the level of detail captured by the 
feature vector. 

3 Classification Results 

Classification methods. We tested two different linear discriminant tech- 
niques on the same data set, namely the Fisher discriminant function and the 
linear Support Vectors classifier m- Given two classes of feature vectors {x}, 
any linear learning method searches for weight vector w that maximizes ‘spread’ 
between the projected points x = w^x. The difference between different linear 
techniques is in how they define spread, or separation, between the classes. 

To find an optimal number of features, we use cross-validation. Since our 
data set is small, we had to resort to leave-one-out cross-validation: one case 
was left out of the training set and then used as a test set. Repeated for all the 
cases in the data set, this yields an estimate of the generalization accuracy of 
the method. We report cross-validation results later in this section. 

Data. We tested our approach on corpus callosum images for two groups: 
schizophrenia patients and normal controls. We used two data sets, combined 
into one in our experiments (see Acknowledgments for more info) . The combined 
data set contains scans of 30 schizophrenia patients (SZ) and of 36 normal con- 
trols (NC). We also performed testing on those data sets separately with results 
very similar to those obtained with the combined data set. 
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Fig. 2. Classification results based on 20 feature points: (a) separation between the 
two groups when projected onto w and (b) weights (components of w) for the features 
along the curve, in the posterior-to-anterior order; (c) deformation implied by the 
discriminant vector, applied to the mean of NC group (top) and to an individual case 
(bottom). Black corresponds to the original shape, gray indicates the result of the 
deformation 



Classification results. Figure EJa) shows the results of Support Vectors clas- 
sification using 20 points along the skeleton. We can see that for this number of 
features, a perfect separation between the two classes was achieved. FigureEKb) 
shows the weights corresponding to the angle features (ordered from posterior to 
anterior). The weights change smoothly as we move along the skeleton, and most 
of the weight is concentrated in the middle part of the skeleton. This suggests 
that the middle ridge is where most of the shape differences take place in this 
case. 

We can also provide a direct interpretation of this result in the image domain. 
Since projecting onto weight vector w separates the two classes, negating the 
component of any feature vector from the original data set along w should 
bring that vector over the threshold into the other class: 

X = xx + (w”^x)w, 

X = Xx — (w^x)w. 

We can apply this operation to any data point x^ in one of the classes and 
then reconstruct the skeleton using the resulting feature vector Xx Thus linear 
classification in the feature domain can be mapped into a shape deformation in 
the image domain. 

Figure EKc) shows the deformation applied to two different skeletons. The 
first example (top) shows a ‘mean’ normal control skeleton. It was constructed 
by averaging the features at the 20 points along the skeleton and reconstructing 
a skeleton from the resulting feature vector. The second example (bottom) shows 
a skeleton for one of the normal control subjects with the deformation implied 
by the classifier. We can see that the corpus callosum shape is more ‘bent’ for 
schizophrenia group. In other words, we would have to bend the normal corpus 
callosum further to make it look more like corpus callosum of a schizophrenia 
patient. 

Cross-validation. Figure01(a) shows learning accuracy, that is the classification 
accuracy, when the test set was the same as the training set. We can see that 
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(a) Learning accuracy (c) Gross-validation accuracy 

Fig. 3. Cross-validation results 



the Support Vectors method outperforms the Fisher linear discriminant, which 
we believe is because it makes fewer assumptions on the underlying distributions 
of the classes. As the number of feature points used for classification grows, the 
data becomes more separable and the accuracy improves. 

Cross-validation is used to find the optimal number of feature points for be 
used for shape description of corpus callosum, as well as to test the generalization 
power of the classifier. Figure m shows the classification accuracy for leave- 
one-out cross-validation experiment. The dotted line shows the ‘baseline’, or 
the classification accuracy one would get by guessing. We can see that both 
methods achieve better than guessing accuracy. The best accuracy was achieved 
by Support Vectors method for 20 feature points. Thus that was reported as the 
best number of points. 

The classification accuracy for cross-validation is significantly lower than for 
learning. There are several reasons for that. As the number of feature points 
grows, the data becomes more sparse in the feature space, and thus it is easier 
to separate between the classes, but we get poor generalization, as new examples 
fall into previously empty regions of the feature space. Another reason for lower 
testing accuracy could be that the classes are not truly separabl^^- 

Another question that should be addressed is the number of features. It seems 
that the optimal number of features is comparable with the number of cases in 
the data set. But it does not mean that we are fitting a model with that many 
independent parameters to the data. In fact, the features highly correlate with 
their neighbors along the skeleton. Another point to confirm this is the fact that 
adjacent points on the skeleton get similar weights (Fig. 0r). 

4 Conclusions &; Acknowledgments 

We presented an approach for shape based classification of anatomical struc- 
tures. It uses statistical learning techniques for investigating the differences be- 
tween two groups of examples of the same anatomical structure. In this work, 

^ Implying that one could not provide a reliable diagnosis of schizophrenia based on 
the shape of corpus callosum alone, but only about 70% accurate estimate. But com- 
bined with analysis of other structures, it might provide a signihcant improvement in 
detecting and quantifying shape pathologies in the brain of schizophrenia patients. 
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we limited ourselves to using linear classifiers. We tested two different linear 
classification techniques: the Fisher linear discriminant and the linear Support 
Vectors classification. 

The shape representation is also a crucial component of the system. It maps 
the images into points in the feature space in which the classification is per- 
formed, and also provides an interpretation of the classification results in terms 
of the shape deformation. We use skeletons for extracting the shape features. 
They provide a robust, intuitive representation of the shape, and are capable of 
capturing shape variations between the groups reported in the paper. 

Based on the experimental results, we conclude that the shape of corpus 
callosum is different in schizophrenia with higher curvature of the shape. The 
cross-validation provided the optimal number of the feature points, as well as an 
estimate of the classification accuracy on the new examples. 
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Abstract. Three dimensional models of anatomical structures are cur- 
rently used to aid in medical diagnosis, treatment, surgical guidance, and 
surgical simulation. Limitations on the resolution of medical scans can 
cause artifacts to appear in the models that do not exist in the patient’s 
anatomy. The most severe artifacts occur due to the low sampling rate 
between image slices of a scan. This paper describes a method of com- 
bining two orthogonal scans to generate a model with higher resolution 
than models created from either of the scans alone. The two scans are 
first registered to each other and then a net of linked surface nodes is 
initialized for each of the scans. The nodes from the two nets are then 
merged and relaxed, subject to constraints set by the resolution of each 
scan. This generates a smooth surface representation which stays faithful 
to the original binary data. 



1 Introduction 

The generation of three-dimensional models of anatomical structures from med- 
ical imagery is important for applications such as surgical simulation, planning, 
and image-guided surgery. An internal scan typically consists of high-resolution 
data in the imaging plane and significantly lower resolution between imaging 
slices. The lack of high-resolution information along the scanning direction causes 
aliasing or terracing artifacts in anatomical surface models, which can be dis- 
tracting or misleading to surgeons. For surgical simulation, the terraces subtract 
from the realism of the visualization and create very noticeable ridges when us- 
ing haptics to feel the object’s surface. These terracing artifacts can be reduced 
by increasing the resolution of the scan. However, for CT scans, higher resolu- 
tion between imaging planes subjects patients to a higher dose of radiation. For 
MR scans, longer scan times are necessary to achieve higher resolution, which 
is more costly and is more difficult for the patient, who must remain absolutely 
still during image acquisition. 

For clinical practice, scans are usually acquired in more than one orthog- 
onal direction. For example, instead of acquiring a single very high resolution 
sagittal MR scan, lower resolution sagittal and axial scans may be acquired (see 
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Fig. 1. (a, b) Two MR scans of a person’s knee. Both images have high resolution 
in-plane, but have about one quarter the resolution between planes, (c) A single Sur- 
faceNet is built around an object and then relaxed, producing a smooth surface, free 
of terracing, (d) Two nets are built from two orthogonal scans and relaxed 



Fig.iit,b). Surgeons and radiologists use information from both acquisitions for 
diagnosis, surgical guidance, and treatment. Similarly, we are interested in com- 
bining the information from two scans to produce three dimensional models of 
internal structures that have higher resolution than models created from either 
of the scans alone. The method proposed here is an extension of the Constrained 
Elastic SurfaceNet described in (5|, which generates models from a single scan. 

2 Previous Work 

Two basic methods are commonly used to fit surfaces to binary data. In the first, 
the binary data is low-pass filtered, and an algorithm such as Marching Cubes 
is applied, where the surface is built through each surface cube at an iso-surface 
of the grey-scale data Pj. To remove terracing artifacts and reduce the number 
of triangles in the model, surface smoothing and decimation algorithms can be 
applied. However, because these procedures are applied to the surface without 
reference to the original segmentation, they can result in loss of fine detail. 

In the second general method for fitting a surface to binary data, the binary 
object is enclosed by a parametric or spline surface. Control points on the surface 
are moved towards the binary data in order to minimize an energy function based 
on surface curvature and distance between the binary surface and the parametric 
surface This approach has two main drawbacks for general applications. 
First, it is difficult to determine how many control points will be needed to 
ensure sufficient detail in the final model. Second, this method does not handle 
complex topologies easily. 

Recently, Gibson |2| introduced Constrained Elastic SurfaceNets which fit an 
elastic net of nodes over the surface of a binary segmented dataset and moved 
the node positions to reduce the surface curvature while constraining the net to 
remain within one voxel of the binary surface. This approach produces smooth 
surface models from binary segmented data that are faithful to the original 
segmentation. 
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3 Dual SurfaceNets 



Dual SurfaceNets extend the original SurfaceNet approach by combining infor- 
mation from two orthogonal volume image scans. The use of Dual SurfaceNets 
requires a number of preprocessing steps |3|. First, the object of interest is seg- 
mented or extracted from each of the scans. The scans are then registered into a 
common coordinate frame by finding the pose that minimizes the sum of squared 
differences of the smoothed segmented images. 

Once segmentation and registration are performed, a SurfaceNet is initialized 
for each of the models. The first step in generating a SurfaceNet is to locate cells 
that contain surface nodes. A cell is defined by 8 neighboring voxels in the binary 
segmented data, 4 voxels each from 2 adjacent planes. If at least one of the voxels 
has a binary value that is different from its neighbors, then the cell is a surface 
cell. The net is initialized by placing a node at the center of each surface cell and 
linking nodes that lie in adjacent surface cells. Figure Dt illustrates the creation 
of a net from a binary image. 

Once defined, the SurfaceNet can be relaxed to reduce terracing artifacts 
while remaining faithful to the input segmentation 0. To relax the net, each 
node is repositioned to reduce an energy measure in the links. In the examples 
presented here, SurfaceNets were relaxed iteratively by considering each node, 
p[i], in sequence and moving that node towards the midpoint of its linked neigh- 
bors. 



where JV{i) is the set of linked neighbors of point i. Defining the relaxation in 
this manner without constraints will cause the net to shrink to a single point. 
To remain faithful to the original segmentation, each node is constrained to lie 
inside its original surface cell. This constraint favors the original segmentation 
over smoothness and forces the surface to retain thin structures and cracks. 

Relaxing a single SurfaceNet of an object significantly reduces the artifacts 
contained in the model. However, if the resolution in the scan is low in one 
direction, there may not be enough information in one scan to fully constrain the 
model and remove the terraces. We therefore consider using two scans, where one 
has higher resolution along the direction where the other has lower resolution, as 
illustrated in Fig. DJI. To relax two models of an object together, the individual 
SurfaceNets are built as described above. The two SurfaceNets, once aligned in 
the same coordinate frame, are iteratively relaxed towards one another with the 
constraint that each node much lie within its surface cell. In one relaxation step, 
each point p[i] in the first net is updated by taking an average (weighted by 
distance) of the points q[j] in the other net. 



p[{\ = 



J2jWip\i],q[j])q[j] 



where 



w{u, v) = 



( 2 ) 



The point p[i] could violate its constraint by lying outside its cell, c[i]. The new 
position of the point, p'\i] is p[i] if it lies inside the cell and the closest point on 
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the cell boundary if p\i] lies outside the cell. In the next iteration, the second 
net is relaxed towards the first. After each full dual relaxation step, the nets are 
each relaxed individually for one iteration. The individual relaxation keeps each 
net smooth as they merge. The iteration progresses until the positions of some 
user-defined fraction of the nodes have converged, at which time one of the two 
nets is chosen to generate the final triangle model. 

If the segmentation and registration were ideal, then the true surface would 
always lie in the intersection of the surface cells of the two images. In this case, 
the two nets would converge on the identical surface with all surface cell con- 
straints satisfied. Figure OH shows a 2D example of a surface passing through 
the surface cells of two nets. In general, the surface cells of the two scans do 
not overlap perfectly due to imaging, segmentation, and registration errors. We 
therefore provide a means of relaxing the constraints to allow the nets to merge 
more closely. After a few iterations, any point that is pulled outside its con- 
straining cell cannot meet a corresponding point in the other net. This signifies 
discrepancies between the two models. In these instances, the constraining cell 
of every such point is dilated (preserving aspect ratio) by a small amount at the 
end of the iteration, allowing those points to move closer to the other net in the 
next iteration. Although the resultant net can move more than one voxel from 
the segmentations, the final model is guaranteed to be between the two initial 
models. 

4 Results 

Results of the dual relaxation are shown in Fig. |2| One scan of a femur was 
acquired axially at a resolution of 0.27 mm x 0.27 mm x 1.00 mm. The other scan 
(of the same person) was acquired sagittally (one year later) at a resolution of 
0.25 mm x 0.25 mm x 1.40 mm. The femur was segmented manually from both 
imagefl Figures |3(a) and (b) shows the results of running Marching Cubes 
individual SurfaceNets |2|, and Dual SurfaceNets on the images. No decimation 
was performed on any of the models. Notice the terracing artifacts in the models 
generated with Marching Cubes and individual SurfaceNets along the direction 
that the scans were acquired. The model generated using Dual SurfaceNets on 
both scans preserves the fine details in the original scans well but does not 
contain the terraces. 

In the second example, we consider building a model from extremely low 
resolution scans. Figure EKc) shows results of model generation from subsam- 
pled versions of the original segmentations. The axial and sagittal scans were 
subsampled by a factor of 4 to resolutions of 1.09 mm x 1.09 mm x 4.00 mm and 
1.00 mm X 1.00 mm x 5.60 mm respectively. The model generated using Dual 
SurfaceNets at the low resolution contains slightly less detail than the high reso- 
lution version, but it is remarkably smooth and free of terracing artifacts, while 



^ These datasets were provided by the Surgical Planning Lab of Brigham and Women’s 
Hospital. 
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Marching Cubes 



SurfaceNets 



Dual SurfaceNets 



(a) Surface models of the high 
resolution scan of the femur. 
The top two images were gen- 
erated from the axial scan, 
while the bottom two were 
generated from the sagittal 
scan, using Marching Cubes 
and SurfaceNets. The larger 
model, generated using Dual 
SurfaceNets, combined the in- 
formation from both scans 



Marching Cubes SurfaceNets 



Dual SurfaceNets 




(b) Another view of the same 
models as shown in (a). No- 
tice the terracing artifacts in 
the Marching Cubes and Sur- 
faceNets models, which are 
most visible when the view is 
along the slicing direction 



(c) Surface models of a low 
resolution (subsampled) scan 
of the femur. The model 
generated using Dual Sur- 
faceNets at the low resolu- 
tion contains slightly less de- 
tail than the high resolution 
version, but it is free of ter- 
racing artifacts and is faithful 
to the original segmentations 

(d) Models of the femur over- 
layed on the original grayscale 
data. The top row of images 
are the original high resolu- 
tion scans. The bottom row 
of images are the subsampled 
scans. The first two images in 
each row are the segmenta- 
tion input to the Dual Sur- 
faceNets. The result of relax- 
ing the nets is shown in the 
third image 



Fig. 2. Results of Model Generation 
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remaining faithful to the original segmentations. The surface models can be 
visually verified by superimposing the relaxed net on the image data. Figure |3i 
shows the input segmentations to the Dual SurfaceNet algorithm and the final 
result of the net. Despite the blockiness evident in all the input segmentations, 
the final models are very smooth and capture the details of the femur. 

5 Validation 

Since three dimensional models of anatomical structures are now routinely used 
by surgeons, there is a clear need to validate the process by which such models 
are generated. One method of validating the result of relaxing Dual SurfaceNets 
is by visual inspection. The 3D model can be superimposed onto the original 
grayscale image, as shown in Fig. 121 . The borders of the model can be confirmed 
by examining each slice of the image. 

In SurfaceNets, each node of the model is guaranteed to lie within one voxel 
of the original binary segmentation. Dual SurfaceNets can uphold the same con- 
straints, but in practice these constraints need to be relaxed slightly to effectively 
combine the information in both nets. The distance that a node strays from its 
initialization point (the center of its cell) can be constrained during the relax- 
ation. Furthermore, upon convergence, the distribution of displacements can be 
analyzed to determine the goodness of the fit. For both nets of the femur data 
set, over 97% of the points lie within one voxel of their starting position. There- 
fore, the final model is not only very smooth, but also faithful to the input 
segmentation (see |3| for more details). 

The validation process is often hindered by the difficulty in obtaining ground 
truth. While we do not have explicit ground truth, we generated the low resolu- 
tion femur model and then compared the result with the high resolution femur 
segmentation. Ideally, each point of the low resolution model should fall near the 
high resolution surface. Even though the voxel extents of the axial and sagittal 
scans used in generation of the nets are 4.28 mm and 5.78 mm respectively, 
the majority of model points fall within one millimeter of the high resolution 
model. Furthermore, 98% lie within one sub-sampled voxel of the original data 
0. The model produced by Dual SurfaceNets on the low resolution scans is a 
good estimate of the high resolution model, while true to the input images. 
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Abstract. Clinical reality is full of complex images that cannot be seg- 
mented automatically with current computer vision technology, requiring 
intensive user intervention. In P] and 0 we proposed a framework for 
the systematic development of intelligent interactive segmentation tech- 
niques that aim at repeatable and predictable results obtained via effi- 
cient interaction. In this paper we apply this framework to segment the 
joint space boundary of osteoarthritic ankles. The solution is based on a 
heterogeneous boundary representation implemented with a new piece- 
wise deformable model. User intervention is necessary only when this 
model fails, being performed via specialized interactive tools. Results 
obtained by a non-medical user are presented, indicating improvement 
over the manual practice in terms of accuracy and repeatability. 



1 Introduction 

In many clinical applications, to segment means to isolate a part (object) from 
the remainder of the image (background). Segmentation techniques here aim at 
precise, predictable and reproducible delineation of objects of interest, being 
based on prior knowledge about how these are expected to be depicted in terms 
of image and geometric features. Unfortunately objects might be represented in 
the image differently than expected due to conditions intrinsic to medical ap- 
plications, e.g. noise, pathology and low contrast. Segmentation methods fail in 
these cases, with the consequence that human intervention is often needed to 
manually enhance the results obtained with automatic techniques. With current 
technology, the automatic and manual parts are performed as two different and 
independent procedures with possibly inconsistent outcome. [Q and |2| propose 
a systematic approach, called intelligent interactive segmentation (IIS), based 
on the following conception: (1) Automatic and interactive parts are unified into 
one segmentation process; (2) The backbone is a steerable automatic segmen- 
tation method with prior knowledge about image and geometric features; (3) 
Situations when this method fails are limited and attributed to cases where the 
image deviates from the prior knowledge in the segmentation model, which can 
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be locally reconfigured to avoid or recover from failure; and (4) User interven- 
tion is needed in case of failure to provide information used for reconfiguration, 
establishing a user model feedback mechanism called intelligence. 

Here we describe how this framework was used to develop an IIS method for a 
complex clinical application. Sect^presents the application, Sect 0 describes the 
new method, and Sect .0] contains an evaluation with a non-clinical experiment. 

2 Clinical Application 

Osteoarthritis (OA) is a joint disorder characterized by the destruction of the 
articular cartilage, subchondral sclerosis, and secondary inflammation. An exper- 
imental treatment called joint distraction is currently applied at the University 
Medical Centre Utrecht, The Netherlands, to patients with OA in the ankle 
joint (Fig^a), consisting of a temporary distraction of talus and tibia using an 
Ilizarov external ring fixation Pj. Evaluation is done based on radiographs of 
the control and OA ankles taken at fixed time intervals before and after treat- 
ment. X-ray images of the ankles are acquired in standardized mortise meiiQand 
digitized with 256 grey levels and variable size (FigUlb). 

A current project at the Image Sciences Institute Utrecht aims at the quan- 
tification of the ankle joint space (AJS) width, amount of subchondral sclerosis 
and angle of the joint to evaluate this treatment. Here manual segmentation 
consists of delineating the central part of the upper and lower boundaries of 
the AJS, a task that requires medical knowledge due to very low image qual- 
ity resulting from the projection of concave or overlapping structures (FigO-b). 
Currently, the boundaries are approximated by two lines connecting ten points 
indicated by the user via a semi-automated procedure; we call this procedure 
manual because no automatic segmentation method takes part in the process. 
More accurate measurements can be obtained with a more precise boundary, 
and a reduction of variation and bias can be achieved by using image data for 
segmentation. 

3 Intelligent Interactive Segmentation Method 

The method is based on four main components: a heterogeneous model of the 
AJS boundary, a piecewise deformable model (DM) implementing this model, a 
list of cases when this method can fail and the appropriate user corrections, and 
a visual language used for interaction (see details in |S| ) . 

Heterogeneous Boundary Model. The AJS boundary sketched in FigOl-c is 
modeled by two open and non-intersecting curves divided in five pieces (Fign-d). 
A study of manually segmented images and anatomical information showed that 
these pieces are characterized by different combinations of image and geometric 
features, leading to the segmentation model in Table [D This model is valid for 
most images, but it admits local modification as a consequence of interaction. 

^ The patient stands with foot turned 20° inwards; acquisition with standard settings. 
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Fig. 1. The ankle joint space, (a) Coronal dissection of a normal ankle (from 0). 

(b) Digital image of a control ankle in standardized mortise view (642 x 840 pixels). 

(c) Scheme showing the boundaries of interest (plain lines), the joint space (shaded 
area), and the misleading boundaries (dotted lines), (d) Model for the AJS boundary, 
where circles indicate corners and rectangles indicate the edge type: white or step edges 

Table 1. Segmentation model for the AJS. Image features computed with scale- 
normalized local image structure detectors 0, using special versions for horizontal 
edges (cu is the scale for derivatives L^, Ly, etc.). Shape features measured by the 
change of the curve’s turning angle 0, with Vi determined from examples 
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Piece-DM. The upper and lower boundaries are implemented as two indepen- 
dent DMs initialized by curves sketched by the user. Since each boundary piece 
is characterized by different image and shape features, homogeneous techniques 
known from the literature 0 are not suited for this application. The current 
implementation is based on a new DM, Piece-DM, with the following features: 
(1) The boundary is represented by a cubic B-Spline curve to ensure geometric 
continuity, local control, and compact representation; (2) Optimization is per- 
formed with the conjugate gradients method, affecting only the position of the 
B-Spline control points; and (3) The objective function PJ) is implemented as a 
sum of K terms with localized influence on the boundary, called pieces: 

K 

Y^Wy{t)0,[C]{t)dt, ( 1 ) 

i=i 

where Wj(t) is the weight of piece j at curve position t, and 0j is the objective 
function associated with piece j, implemented as a weighted sum of terms mea- 
suring the deviation of a curve segment from prior knowledge. This deviation is 




Intelligent Interactive Segmentation of the Ankle Joint Space 397 



Table 2. Cases in which the model fails, the possible causes, and the corresponding 
segmentation model correction or tuning to obtain the desired result 



Failure 


Cause 


Model Correction 


1. Wrong curve 


Wrong initial, or deformation 


Modify curve 


2. Unseen visual 
evidence 


Image intensity profile is different 
from expected 


Locally replace the image fea- 
ture detector 


3. Low or absent 
visual evidence 


Flat image intensity profile 


Keep the curve locally near the 
position indicated by the user 


4. Wrong visual 
evidence 


Another structure disturbs the 
correct boundary identification 


Locally reduce the weight of the 
image feature 


5. Deviation from 
local shape 


Corners are wider/narrower, or 
stretches are smoother /rougher 


Locally modify expected turn- 
ing angle values. 



computed with the Mahalanobis distance |2|, i.e., it is normalized to the range 
of expected values obtained from a sample data set, providing intuitive and pre- 
dictable behavior. This Piece-DM is heterogeneous and flexible to accommodate 
local corrections resulting from interaction, by adding or replacing pieces and 
locally tuning the weights and range of expected feature values. 

Cases of Failure. Failure happens when the contour deviates from expectations 
in terms of image features (visual evidence) and/or local shape features. A limited 
number of cases were identified as a result of the systematic analysis proposed in 
0 - see Table 0 where these are presented together with the causes for failure 
and the corresponding DM correction. These cases are detected as a consequence 
of interaction (see Table 01 • 

User-Computer Interaction. Interaction is limited to three situations: ini- 
tialization, model correction or confirmation, and acceptance of the final result. 
During initialization, the user adjusts a template to the image which is used 
to build a Piece-DM based on the segmentation model in Table 0 The DM is 
then displayed using an intuitive abstraction (FigEI-a): an open curve for the 
boundary, and arrows for the “deformation forces,” indicating the preferred di- 
rection of local boundary motion resulting from optimization. The boundary is 
also displayed in separate windows using other image detectors as background 
(e.g. FigBb), to help the identification of failure case #2 in Table 0 

This visual information enables the user to plan the next action (see Table OJ, 
essentially confirming or indicating the need for correction of current DM set- 
tings. Confirmation activates the optimization process (if forces are still large) or 
ends segmentation (if forces are small). In other situations, a case of failure from 
Table O is determined based on the internal status, and the model is corrected 
accordingly (see Table 0. Example: in Fig0a the boundary position is wrong, 
but the forces roughly point to the right orientation; the user therefore confirms 
the model, and the program optimizes it until the forces are very small (FigEI-c). 
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Fig. 2. Visual information showing the lower boundary and forces, (a) Bonndary and 
deformation forces after initialization, with the grey image as background (b) Boundary 
after initialization, with the horizontal step-edge detector as backgronnd (dark areas 
correspond to high response), (c) Boundary after deformation (invisible forces) 



Table 3. Summary of visual conditions observed by the user (boundary position and 
orientation of deformation forces), the corresponding user actions, internal conditions 
analyzed automatically, and the interpretation by the program in terms of Table 



Visual Status 


User Action 


Internal Status 
(Force Mag.) 


Program 

Action 


Boundary Pos. 


Forces Orient. 


wrong 


wrong 


drag curve 


- 


Failure 1 


right 


wrong detector 


point new detector 


- 


Failure 2 


right 


wrong 


“freeze” curve 


small image force 


Failure 3 


large image force 


Failure 4 


large shape force 


Failure 5 


wrong 


right 


confirm 


large 


Optimize 


right 


unseen 


very small 


End 



4 Results and Discussion 

An evaluation of intra-operator variation was performed as follows. Ten images 
were segmented by a non-medical user, three times each, with one day and one 
week interval among sessions - see qualitative results in Fig0-a/b. The intra- 
operator variation was quantified by the distance in pixels from all points in the 
central piece of one curve to all the other curves. Results show agreement within 
one pixel for most images, with smaller variation for the lower boundary (/i„p = 
1.56 ± 1.620 fJ'iow = 0.55 ± 0.37). This represents a significant improvement 
(> 50%) over the results obtained with the manual method (/i„p = 3.24 ± 1.68, 
= 2.19 ±1.29). 

It is difficult to validate the correctness of results in this application because 
the truth is not exactly known; for this purpose, evaluation with medical users 
is essential. For a qualitative assessment of correction, we compared results of 
the IIS method (non-medical user) to those obtained manually (medical user), 
obtaining the following conclusions: (1) Interactive results agree with manual 
boundaries to a large extent (e.g. Fig0-c); (2) Agreement is bad under low visual 
evidence (FigEJd); and (3) Agreement is better for lower boundaries (^„p = 
5.93 ± 4.47, = 3.34 ± 2.11). 

Pup refers to the upper boundary, and fiiaw to the lower. Values in pixels. 
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Fig. 3. Best and worst cases for (a,b) intra-operator variability and (c,d) matching 
between interactive and manual results. Lines refer to interactive results and crosses 
correspond to points indicated with the manual method, (a) Best agreement = 
0A6, niovj = 0.13). (b) Worst agreement (p„p = 5.51, pjou, = 1.29). (c) Best match 
(pitp — 2.09 , ^lom — 2.34). ((b) Worst match — 15.21, f^iow — 9.22) 

5 Conclusions 

Results indicate that the interactive method described here provides repro- 
ducible and precise delineation of the ankle joint space boundary by means of 
an efficient interaction process, with significant improvement over the manual 
practice. This method was developed in four months from a strategy |2| that can 
be adopted in other situations in which the segmentation task is too complex to 
rely completely on an automatic method. We conjecture that interactive-steered 
segmentation will be helpful in the majority of clinical applications. 
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Abstract. In this paper a novel model driven segmentation approach 
for thoracic MR-images is presented. The goal of this work is to coarsely, 
but fully automatically localize the boundary surfaces of the heart and 
lungs in thoracic MR sets. The major organs in the thorax are described 
in a three-dimensional analMical model template by combining a set of 
fuzzy implicit surfaces by means of Constructive Solid Geometry, and 
formulating model registration as an energy minimization. The method 
has been validated on 20 thoracic MR volumes from two centers (patients 
and normal subjects) . On average 90 % of the contour length of the heart 
and lung contours was localized with sufficient accuracy (average 6 mm 
positional error) to automatically provide the initial conditions for a 
subsequently applied locally accurate segmentation method. 



1 Introduction 

Though many automated segmentation methods for thoracic Magnetic Reso- 
nance image data have been described, many of these methods require at some 
point user interaction in the form of a seed point, volume of interest or ini- 
tial boundary model. To further automate this initial image interpretation step, 
integration of prior knowledge in the form of an anatomical model is essential. 

The goal of this work is to develop a hybrid anatomical knowledge represen- 
tation suitable to coarsely, but fully automatically localize the heart and lung 
surfaces in thoracic MR images. The model described here combines the context- 
preserving properties of volume-based methods (e.g. P|) with the compactness 
of surface-based (e.g. I2IBI) models by modeling multiple organs in their spa- 
tial context as a set of 3D fuzzy implicit surface templates. This template-based 
approach provides a number of key benefits, which can be summarized as follows: 

— though limited in flexibility, it is intrinsically three-dimensional without re- 
quiring point-correspondence. 
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— it simultaneously captures the 3D shapes and spatial context of multiple (in 
this application 6) organs in a single, closed-form energy function, 

— it enables a fast, fully automatic image registration by integrating prior 
knowledge about local image gradient polarity in the matching criterion. 



2 Methods 



2.1 Implicit Solid Modeling 

Let a regular implicit surface be given in the form f{x,y,z) = c. An approxi- 
mation of the Euclidean distance d{x^ y, z) of a point (x, y, z) near the surface is 
given by: 



d{x,y,z) 



f{x,y,z) - c 
l|V/(a:,j/,z)|| ■ 



( 1 ) 



From this signed distance estimate d{x,y^z), a scalar field v{x,y,z) can be de- 
rived which expresses the implicit surface as a fuzzy membership function: 



v{x,y,z] w) 



a 



0 , 



d{x,y,z) \ 
2w ) 



if d{x, y, z) < —w, 
if \d{x,y,z)\ < w, 
if d{x, y, z) > w. 



( 2 ) 



When traversing the surface along the surface normal vector, v{x, y, z; w) de- 
scribes a gradual, approximately linear transition (width 2w) between the state 
‘inside’ {v{x,y, z;w) = 1) to ‘outside’ {v{x,y, z]w) = 0). 

However, single implicit object models are intrinsically limited in their de- 
scriptive shape range. To extend the descriptive power of single implicit surfaces, 
a well established framework is provided by Constructive Solid Geometry (CSC) 
which allows the description of a 3D object shape by decomposing it into 
a Boolean combination of simpler shapes. CSC is often implemented as a tree 
structure, in which the leaf nodes contain a shape descriptor of the shape prim- 
itives and the internal nodes implement the Boolean set operators. All nodes 
contain a transformation, which translates, rotates and scales the shape mod- 
eled in that particular node with respect to the other objects in the tree. 

Classically, CSC is implemented in the form of a Boolean point classification 
function, which classifies a point to inside or outside of the object. By replacing 
the crisp Boolean set operators by fuzzy set equivalents, CSC is applicable to 
express a composite shape as a membership function. The following fuzzy set 
operators were adopted from |^: 

— Complement: ~ v{x, y, z) = 1 — v(x, y, z) 

— Union: vi{x, y, z) U V 2 (x, y, z) = max (ui(a;, y, z),V 2 (x, y, z)) 

— Intersection: vi{x, y, z) fl V 2 (x, y, z) = min (fi(a;, y, z),V 2 {x, y, z)) 

Note that for two primitives with equal surface gradient polarity (e.g. pointing 
outward), the combined shape’s polarity is pointing outward as well. 
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From v(x, y, z; w) a boundary membership functional b{x, y, z; w) can be derived, 
which is maximal exactly on the boundary surface (see Fig. QJ: 




j(3l ^ 



Fig. 1. Examples of the application of fuzzy set operators to two fuzzy implicit curves 

2.2 Anatomical Thorax Model Construction 

By combining a number of fuzzy implicit surfaces with continuous CSG, a coarse, 
3D shape description of a moderately complex scene can be constructed in the 
form of a potential function b{x, y, z;w). A set of such implicit shape templates 
of the major organs in the thorax has been constructed in the following steps: 

1. Data acquisition: a gated transverse MR image volume of a normal thorax 
was acquired, in which the contours of the heart, both lungs, thoracic wall, 
liver and spleen were drawn manually. Contours were subsampled to form 
regularly tesselated 3D point meshes, which were manually subdivided into 
approximately convex surface patches. 

2. Implicit surface fitting: the overall shape of these point grids was modeled by 
fitting an implicit surface to the point data. In this work the hyperquadric 
shape models jS| of six terms were selected, mainly because hyperquadrics 
compactly describe a large range of non-symmetric 3D shapes. 

3. Organ model construction: for each organ the fitted primitives were grouped 
into small CSG-trees, forming a three-dimensional shape template for each 
organ. In the top node of each organ CSG-tree, a polarity direction was 
defined for the organ surface normal based on the three-dimensional image 
gradient direction of that particular organ in a typical thoracic MR volume. 
For organs containing air (both lungs), the model normal vector was defined 
as pointing inwards. For all other organ models (heart, liver, spleen and 
thoracic surface), the model polarity was defined as pointing outwards. 

4. The separate organ templates were hierarchically grouped into two scene 
trees: a tree describing the lungs, heart, cardiac ventricles, liver spleen and 
the thoracic outer surface and a separate tree merely describing the tissue-air 
surfaces in the thorax (both lungs and the exterior thorax wall). 
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2.3 Model Matching 

The model-image registration is based on the transitional boundaries between 
air and other tissues in thoracic volume scans for two reasons. Firstly, air is 
relatively robustly automatically separable from other tissues in a thoracic MR 
volume. Secondly, in a thoracic MR scan the image gradient vector in tissue-air 
surfaces can be defined to point towards the air, i.e. for the lungs as pointing 
inwards, and for the torso as pointing outwards. 

Given a set of N points located on the transitions between tissue and air in 
a thoracic MR dataset. By combining the model-contained distance information 
with the image gradient direction, the following energy functional can be formu- 
lated, which has a strong minimum when the tissue-air model is registered to a 
feature pattern congruent to the model template shapes: 

N 

^{‘^node) — ^ ^ (1 Uit node: Vi: • (4) 

i=l 

i^node = (Sx, Sy, s^, tx,ty, tz,rx,Ty, T represents the affine scaling (sa:,y,z), trans- 
lation (tx,y,z), and orientation (vx^y^z) parameters in a CSG node, and {xi, yi, Zi) 
is a candidate boundary point. The weighting function qi(xi,yi, Zi) is defined 
as a switch selecting feature points in which the local hyperquadric gradient 
'^niodei{xi,yi,Zi) and the image gradient V imagei.Xi,yi, Zi) point in the same di- 
rection, within a margin ip. 

/ \ / f j if ^ image{Xi ^yi^ Zi) • V rnodeliXi : yi: ^i) ^ COS . 

~ \0,i^V^mageiXi,yi,Zi)•VmodeliXi,yi,Zi) <COSp. 

The actual model-image matching is performed in the following steps: 

1. Feature point detection. To detect points on the tissue-air boundaries, a 
simple adaptive thresholding was implemented, based on a characteristic 
‘air’ peak in the lower gray value range in the histogram of a thoracic MR 
volume. 

2. Initial model positioning. To initialize the matching, a fixed initial parameter 
set for the top node pose and scale parameters was selected. This parameter 
set positions the model in the middle of the scanner bore, aligned with its 
long axis. Angle parameter p in (5) was set to 60 degrees. 

3. Energy minimization. The energy minimization can be formulated as a hi- 
erarchical pose and scale estimation. First, Ei^node) is minimized for the 
top node affine parameters dnode using a gradient descend method, there- 
fore simultaneously translating, rotating and scaling the whole model until a 
minimum is reached. Subsequently, the top node parameters are frozen, and 
E{'&node) is further minimized with respect to '&node of one of the subtrees 
for a single organ or a combination of organs. By repeating this procedure 
throughout a number of tree levels, the match is refined in each matching 
step. In the top node matching step, boundary width parameter w was set 
to 50 mm, whereas in all subsequent refining steps for lower tree levels w 
was set to 20 mm. In Fig.O, an example is given of a matching result. 
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Fig. 2. A matching result for a gated short-axis cardiac MR-set. In the top row, the 
initial model is projected on three slice levels, whereas the bottom row shows the model 
after matching. The displayed boundaries are 2D cross-sections through a 3D model 



3 Clinical Validation and Discussion 

The model matching procedure was validated on 20 thoracic volume scans rou- 
tinely acquired from 17 patients with various cardiac pathologies and 3 normal 
subjects. To assess the accuracy of the method for both lungs and the heart, im- 
age volumes were acquired containing all these organs, in this study the so-called 
localizers or scout views. To investigate the method’s dependency on the MR 
imaging system, the studies were acquired at two centers using different MR- 
scanners: a GE Signa-LX real-time CVMR scanneiQ and a Philips Gyroscan NT 
10. Image sets consisted of 27 images (9 sagittal, 9 coronal, 9 transverse). On 
each image set, the model-image registration was performed fully automatically. 

To quantitatively assess the accuracy of the matching results, 9 frames were 
selected from each image volume by an independent observer. In all these images, 
two observers manually traced the contours of the left lung, right lung and the 
epicard. Two quantitative measures were calculated to express the accuracy of 
the model-predicted contours. Since the model only coarsely describes thoracic 
anatomy, it was expected that local details possibly drawn by the observers (ves- 
sels, small local shape variations) are missed by the model. Therefore, for each 
organ contour a quality measure was defined as the percentage of the contour 
length correctly predicted by the model within a 20 mm margin on each side of 
the organ surface. Second, for the correctly localized contour parts, the average 
distance of the contours to the corresponding model surface was calculated. 

^ University of Iowa Hospitals and Clinics, Iowa City, USA 
^ Leiden University Medical Center, Leiden, The Netherlands 
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The results of this validation study can be summarized as follows: 

— In all 20 cases, the automated matching converged to a semantically correct 
solution, demonstrating matching robustness with respect to initial position 
under clinically realistic circumstances. The influence of large amounts of 
spurious feature points generated in the low-level boundary detection step 
was negligible, since only feature patterns congruent to the model template 
shapes influenced the matching energy function. Furthermore, the matching 
was found to be scanner- independent for image sets acquired with the body 
coil of the scanner. 

— On average, 90 % of the contour length of the manually drawn lung and 
epicardial contours was localized within a 20 mm margin (worst case: 79%). 
As expected, in cases where the observers had drawn a structure not present 
in the model, these contour parts were missed by the model. A large part of 
the failure rate for a contour (0-21%) could be attributed to vessel structures 
in the cardiac in- and outflow tract, which were not included in the model. 

~ The average distance of the contour parts contained within 20 mm of the 
automatically identified surfaces ranged from 4-8 mm. 

— In general the results from both observers for the entire study corresponded 
well, though in some cases there was a slight discrepancy. 

The computation time required to match the model increases approximately 
linearly with the number of images. For a scout view consisting of 27 images, the 
matching procedure took 4-6 minutes on a Sun Ultrasparc 2 workstation. Initial 
experiments were also performed where the model was matched to three images 
(1 sagittal, 1 coronal and 1 transverse), and qualitatively only minor differences 
in the matching results were visible. In these cases the matching procedure took 
less than 30 seconds. Based on the presented validation, it can be concluded 
that with the described anatomical modeling and matching method, a robust 
estimate of the approximate location of the heart and lung surfaces in a thoracic 
MR image set can be obtained fully automatically. Though the modeling method 
lacks local detail, on average 90% of the contour length of the lung- and epicardial 
contours was localized within 20 mm, with an average positional error of 6 mm. 
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Abstract. We describe a method for labelling image structure based on 
non-linear scale-orientation signatures which can be used as a basis for 
robust pixel classihcation. The effect of normalisation of the signatures is 
discussed as a means to improve classification robustness with respect to 
grey-level variations. In addition, model data selection and scale normali- 
sation are investigated as a means to improve the robustness of detection 
with respect to the scale of structures. 



1 Introduction 

We are interested in the detection of structures in images. We assume that the 
position of these structures is unpredictable and that they will be embedded 
in a background texture. We describe an approach based on the construction 
of a non-linear scale-orientation signature at each pixel. This provides a very 
rich description of local structure which is robust and locally stationary. Given 
this description, standard statistical classification methods can be used - we give 
results for a linear classifier. To improve detection with respect to local grey-level 
variation and scale change intensity and scale normalisation are investigated. The 
effects of different strategies for selecting training data are also explored. 

2 Scale— Orientation Signatures 

A Directional Recursive Median Filter (DRMF) performs a smoothing operation 
that removes (sieves) image peaks or troughs of less than a chosen size P . By 
applying sieves of increasing size to an image and taking the difference between 
the output image from adjacent size sieves, it is possible to isolate image features 
of a specific size. Signatures at different positions on the same structure are 
similar (local stationarity) and the interaction between adjacent structures is 
minimised. The signature, •f'(cr, 0), is a 2-D array in which the columns represent 
measurements for the same orientation and the rows represent measurements for 
the same scale. For typical synthetic examples of signatures see Fig. ^ 
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Fig. 1. Some synthetic examples of multi-scale DRMF signatures, where the larger 
four images show (a) a binary blob, (b) a binary linear structure, (c) a Gaussian 
blob, and (d) a Gaussian linear structure. The twelve smaller images are the scale- 
orientation signatures for the centre pixel (top), for a pixel at the extreme edge of 
the structure (bottom) and for a pixel in between these two extremes (middle). In the 
scale-orientation signature images, scale is on the vertical axis (with the finest scale at 
the bottom) and orientation on the horizontal 



3 Statistical Methods 

The objective of the work is to classify pixels, that is to label each pixel as 
belonging to a certain type of image structure. Since any method is likely to be 
imperfect it is useful to explore a range of compromises between false negative 
errors (poor sensitivity) and false positive errors (poor specificity). Detection can 
be performed by thresholding a class probability image. The probability density 
for an observation vector Xj for a pixel of class i which is given by 

where Sij is the Mahalanobis distance to the class mean and is the the 
covariance matrix of class i. Applying Bayes theorem a probability image for 
class i (e.g. blob) out of rj classes is found by calculating, for each pixel 



= P(*) 

p{vY 



( 2 ) 



4 Signature Preprocessing 

Principal component analysis can be used to obtain data generalisation and 
efficiency for classification purposes, by reducing the dimensionality of the data, 
instead of using the full signature information Pj . 

We intensity normalise signatures since there is no reason to believe that 
high-contrast features are more important than those of low contrast. Indeed, it 
is particularly important to detect small, low contrast lesions of characteristic 
appearance. Each column in each signature is normalised independently. 

We would like to treat different size features equally. A change in scale ap- 
pears in the signatures as a vertical shift. The effects of such a shift can be 
reduced by taking the FT of each column and using the amplitude term. 
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5 Mammographic Data 

The mammographic data we have used consists of 54 mammograms, of which 
27 contain a spiculated lesion and the other 27 are normal mammograms from a 
sequential set. The outlines of all the lesions have been annotated by an expert ra- 
diologist. The sizes of the abnormalities range from 5 to 30 mm (mean=13.4 mm). 

We build statistical models based on a subset of all the signatures in the 
dataset. The first approach is based on a subset of the data which uses all the 
signatures within the abnormalities and an equal number (150 per mammogram 
in this case) randomly selected from the normal images (this will be referred 
to as the basic signature dataset). This means that the subset contains a larger 
number of signatures from the larger sized lesions than from the smaller ones. 
In a second approach, to remove bias towards signatures from larger lesions only 
150 signatures were selected from each abnormal mammogram. 

6 Signature Classification — Training Data 

The ROC results for the basic signature dataset are shown in Fig. Efi. These 
indicate that the PCA based model has an overall better performance regardless 
of the normalisation approach used. A second observation is that the IFT based 
normalisation on its own does not do better than classification on the original 
data, but that normalisation or normalisation in combination with the TT ap- 
proach provides overall better classification results. Finally, if we compare the 
85% PCA based model results with the 100% based model results an overall 
better classification performance is achieved by the former. 





(a) (b) 

Fig. 2. ROC results for signatures based on: (a) the basic signatures model, (b) the 
150 signatures model. Here [xi: 85% PCA data. A: normalised 85% PCA data, □: J-T 
of the normalised 85% PCA data, O: TT of the 85% PCA data, V- 100% data, x: 
normalised 100% data, *: TT of the normalised 100% data, and +: TT of the 100% 
data 



The ROC results for the model based on the data containing 150 signatures 
from each mammogram are shown in Fig. |2b- Again, the 85% PCA data based 
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model provides superior classification results when compared with the 100% 
data based model. Also, the best classification is based on the 85% PCA based 
model for which the signatures have been normalised and subsequently a TT 
was taken. 



7 Probability Images 

Based on the results presented in Sect. 6, we have applied the derived models 
for the basic signatures dataset, the 150 signature dataset, and the TT of the 
normalised signatures of both these datasets to all the pixels in all the mammo- 
grams. Some typical examples of probability images (see Sect. 3) are shown in 

Fig. El 

When comparing the 85% PCA and 100% models the probability images 
based on the 85% PCA data produce clearer results with fewer disconnected 
regions. The lesion is detected with a high probability in the models based on 
the TT of the normalised signature data (with a high level of small false positive 
regions) . 



8 Classification 

Pixel classification results based on the probability images were a general con- 
firmation of those results presented in Sect. 6. 

Instead of performing pixel classification, the probability images can be used 
for region classification which is more appropriate for the envisioned prompt- 
ing environment. Normally, probability images are thresholded and regions are 
grown based on the resulting binary images fj. This is done for a number of 
thresholds to obtain points on a FROC curve. However, one drawback of such an 
approach is that for low threshold values large regions of the breast are detected 
which are non-localised and produce misleading results. 

To improve upon this approach we have segmented the probability images 
prior to thresholding and the resulting regions are preserved in the subsequent 
classification results. To obtain the segmentation we have found convex regions 
(peaks) in the probability images (after applying a morphological smoothing). 

In Fig. El the FROC results for the basic signature model are shown. Results 
for detected lesions larger than 4 mm in diameter (Fig. Efi) and larger than 
8 mm in diameter (Fig.0D) are shown. The first observation is that the method 
performs better for the larger lesions. Again there is an improved performance 
for the TT of the normalised data versus the basic signature data, and for the 
85% PCA data versus the 100% data. 

The FROC results for the 150 signature data are shown in Fig. 0 In this 
case there is a larger distinction between the detected lesion size. This means 
that the performance for the larger lesions was not as good as for the smaller 
lesions, whilst the results shown in Fig. 0 show not as much size dependence. 
These unexpected results (as the 150 signature models were expected to be less 
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(d) (e) (f) 



Fig. 3. Example of applying the classification approach to a mammogram, (a) original 
mammogram, (b) 85% PCA model based probability image, (c) 100% model based 
probability image, (d) lesion annotation, (e) TT of normalised 85% PCA model based 
probability image, (f) TT of normalised 100% model based probability image. The 
probability images are displayed on log-scale, with the white representing 1.0 and black 
representing 2 x 10“® 



size dependent) might be an artifact of our new approach to obtaining FROG 
curves and will need further investigation. 

9 Conclusions 

We have described a number of methods to improve detection of mammographic 
lesions based on scale-orientation signatures. These methods involve the normal- 
isation of the scale-orientation signatures. A method incorporating both inten- 
sity and scale normalisation proved to be most successful in the classification of 
mammographic data. 

It seems that the results presented are fairly independent of the choice of 
data on which the models are based (the best results are obtained for models 
based on taking all signatures within the lesions into account). 
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Fig. 4. FROC results for mammograms based on the basic signatures model: (a) 4 mm 
and (b) 8 mm detection area, where A: 85% PCA data, □: TT of the normalised 85% 
PCA data, O: 100% data, and V- of the normalised 100% data 





Fig. 5. FROC results for mammograms based on the 150 signatures model: (a) 4 mm 
and (b) 8 mm detection area, where A: 85% PCA data, □: TT of the normalised 85% 
PCA data, O: 100% data, and V- of the normalised 100% data 



Basicly, the approach described so far is a method for the detection of blob- 
like structures in mammograms (or other images). However, to reduce the num- 
ber of false positive detections we are investigating a reclassification approach. 
With such an approach a second statistical model would be build based on the 
data which has a high probability of being a blob. This would result in a two-class 
blob-model representing blobs in normal and abnormal mammograms. 
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Abstract. Image transforms used to preprocess mammogram images 
and highly selective microcalcification feature extraction are analyzed in 
this paper. It is demonstrated that the results obtained by the proposed 
method (especially at high true positive rates) exceed the specificity lev- 
els of other measures for which FROC analysis is provided using the 
same public (Nijmegen) database. 



1 Introduction 

We have based our research on the results of N. Karssemeijer, who has described 
in PE! a method for an iso-precision scale transform of mammograms and the 
clusterization of individual microcalcifications using an iterated statistical model. 
During development of our algorithms we employed the images used in |2| and 
made available for public use (Nijmegen database), but we have also tested 
them on a larger set of mammograms digitized using our system at the National 
Institute of Oncology (Budapest, Hungary). 

We have found that the preprocessing step has a major impact on the detec- 
tion results as is also stated in |2|. Therefore we have first compared different 
image transforms in terms of their effect on the detectability of individual mi- 
crocalcifications. We also investigated whether the iso-precision scale transform 
can be further improved by a subsequent locally adaptive noise equalization 
transform. We report on our experiments and findings in Sect. 2. 

With respect to the detection of individual microcalcifications, several meth- 
ods have been proposed and studied so far. In this paper we propose one single 
feature which we have found to be very selective for the detection of microcalcifi- 
cations. This is based on a local contrast measure we have specifically designed to 
characterize microcalcifications. By using two additional measures (a compact- 
ness measure and the result of a line detector) the results can be improved even 
further, but we have found that the effectiveness of our algorithm is primarily 
based on the high selectivity of the contrast measure with respect to microcal- 
cifications. The proposed contrast measure is defined in Section 3. In the same 
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section we compare our results to the results of N. Karssemeijer and others’ who 
have provided FROG (free-response receiver operating characteristics) analysis 
of their methods based on the Nijmegen image database. We did not implement 
the statistical clusterization process described in [2| . This facilitates comparison 
because most detection methods do not utilize spatial interaction models and 
also in |5| the FROG analysis for the detection of individual microcalcifications 
is provided. 

Due to the reduced scale of this publication we only mention that the pre- 
sented method for microcalcification detection and other image processing func- 
tionalities are integrated into a complex system called Analogic Mammogram 
Diagnostic Workstation which is fully equipped for mammogram digitiza- 
tion, archiving, processing and display. The system is based upon a database 
handling program for handling patients’ personal and clinical data, which has 
been used in everyday practice for more than 3 years at the National Institute of 
Oncology (Budapest, Hungary). The system was made suitable for automated 
film digitization, archiving on GDs and indexing. We have developed a general 
purpose graphical interface which can be accessed from the database program 
for image display, and hosts the image processing algorithms. 

Our ultimate object is to integrate with our system a specific hardware tool, 
the GASTLE chip ^ , which is expected to be a very high-speed image processor 
and is now under development at our laboratory. The chip will operate on the 
principle of cellular neural networks |S|. 

2 Space- variant Noise Equalization 

Processing or display of digitized mammogram images is usually preceded by 
some transformation. A logarithmic transform of intensity values is general for 
digitized film images, but several other transforms (global or adaptive) have also 
been investigated so far. The iso-precision transform demonstrated by N. Karsse- 
meijer in fP and applied for noise equalization and microcalcification detection 
in mammograms [2| means that the noise level is estimated as a function of the 
gray-scale intensity of the original image, and a transformation is applied which 
equalizes the specific noise measure. In this section we evaluate the effect of dif- 
ferent image transforms on the performance of microcalcification detection. We 
compare a global linear transform (LIN), logarithmic transform (LOG), adaptive 
histogram equalization (AHE) |S|, iso-precision transform (IPA), adaptive noise 
equalization (ANE), and the succession of iso-precision transform and adaptive 
noise equalization (IPA -|- ANE). 

The difference between IPA and ANE is that while IPA uses a global estimate 
of noise level for each mammogram to normalize image features, the noise level is 
computed locally in case of ANE. Therefore the noise estimate for ANE has to be 
chosen carefully. It has to be avoided that areas where microcalcification clusters 
are to be found produce a higher noise measure only because of the presence of 
microcalcifications. We have found that a statistical filter is very little correlated 
with the presence of microcalcifications which replaces only local minimum val- 
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ues with the closest (higher or equal) intensity value in their neighborhood. The 
noise estimate is then computed from the filtered and original image difference 
averaged over a 21-by-21 window. 

The comparison of the different transforms is performed by determining the 
number of detections made inside the marked clusters at a fixed number of 
total detections for each image. This can be adjusted by the proper setting of a 
contrast threshold. Here we used the local contrast measure defined in |2|. False 
negative clusters (i.e., in which less than two detections were made) are also 
considered in the comparison. 

The comparison results for 500 detections per image are shown in TableOl It 
is clear from this list that there is no significant difference between the IPA, ANE 
and IPA+ANE noise estimation techniques. Indeed, more detailed analysis shows 
that some of the images perform better with the IPA transform, others with ANE 
noise estimation. This is true for higher contrast thresholds (i.e., fewer objects 
detected) as well. The logarithmic transform is found to be least appropriate for 
microcalcification detection with the highest number of false negatives and the 
lowest in-cluster/out-cluster detection ratio. The AHE transform may radically 
increase contrast in darker areas of the images. Therefore the threshold level 
has to be set relatively high in order to detect only 500 objects per image, and 
some clusters will be missed at places where the contrast stretching was not so 
extensive. The LIN transform gives the highest in-cluster/out-cluster detection 
ratio, which can be explained by the fact that all the other transforms tend 
to increase the contrast of darker areas, while microcalcification clusters are 
most often to be found in brighter regions. However, some clusters in darker 
regions will hardly be detected after this transform. If we want to obtain low 
false negative rates, the best choice seems to be one of the methods based on 
noise equalization (IPA, ANE, IPA+ANE). Because the look-up tables of the 
IPA transform are publicly available for the images in the Nijmegen database, 
in the following the IPA-transformed Nijmegen images are analyzed. 



Table 1. Comparison of different preprocessing transforms using the 40 images in the 
Nijmegen database. Results for linear (LIN), logarithmic (LOG) transforms, adaptive 
histogram equalization (AHE), the iso-precision transform (IPA), adaptive noise eqnal- 
ization (ANE) and the combination of the latter two (IPA+ANE) are shown. For a 
fixed number of detections per image (500), we calculated the average detections inside 
marked clusters and the number of missed clusters among the 104 clusters marked in 
the images 





LIN 


LOG 


AHE 


IPA 


ANE 


IPA -h ANE 


detected objects in each image 


500 


500 


500 


500 


500 


500 


in-cluster detection average 


95 


61 


76 


75 


72 


75 


total number of FN clusters 


4 


7 


2 


0 


0 


0 
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3 FROC Analysis of a Contrast Measure for 
Microcalcification Detection 

The local image feature we propose for measuring the contrast of a candidate 
microcalcification is computed as follows. Starting from the ‘center’ of the mi- 
crocalcification, we apply a region growing algorithm which adds one pixel to 
the actual set of pixels in every iteration. The ‘center’is the brightest pixel of 
the region, and at each turn, the brightest pixel is added in the neighborhood of 
the actual set. 

During the growing process, the average gray level of pixels within the region 
(min) is calculated along with the average gray level of pixels in the neighbor- 
hood of the region (mout)- The contrast measure is defined to be the maximum 
difference min-mout during the iteration. Because the shape of microcalcifica- 
tions vary a lot, this single measure may describe the contrast of this amorphous 
object better than a set of standard spatial and spectral domain measures. The 
quality of the proposed measure is demonstrated by FROC analysis in the next 
section. 

The power of the proposed local contrast measure in its ability to detect indi- 
vidual microcalcifications can be analyzed by constructing FROC (free-response 
receiver operating characteristic) curves |Z] . We calculate the true positive frac- 
tion (TPF) and false positive (FP) clusters as defined by N. Karssemeijer |2|. 
However, we mention that a cluster is expected to have at least 5 microcalcifi- 
cations over a 1 cm^ area in jS|, and others apply even higher limits. 

We have found that - besides the local contrast - it is convenient to char- 
acterize the microcalcifications with a normalized second order central moment 
computed over a 9-by-9 window around the object, and a binary output line 
detector. These measures depend on the spatial distribution of intensity levels 
rather than on their contrast. In Fig. Q] we show the detection results of ex- 
periments for all 40 mammograms in the Nijmegen database. The results were 
obtained by using the local contrast measure (cont) and its combinations with 
the second order moment (m) and the binary output of the line detector (1) . The 
following combined measures were formulated heuristically: 

1. cont/m 

2. cont-1 

3. cont/m-1 

As we have stated before, we did not take into consideration spatial dis- 
tribution of microcalcification candidates. Therefore, comparisons have to be 
performed at the level of the detection of individual microcalcifications. In Fig. 
El we compare our results to the results of N. Karssemeijer [2|, D. Meersman 
et al. |E], and Strickland et al. m- It is clear from the comparison of FROC 
curves that at high true positive rates our measures for microcalcification detec- 
tion are superior to the measures used by the other three methods. In the same 
figure we also provide detection results if images from our own database are 
used. We have digitized 315 images of 114 patients using a Cobrascan CX-312T 
(RDI Inc.) X-ray film scanner at 300dpi spatial and 12 bit intensity resolution, 
of which 59 images contained 75 microcalcification clusters. The FROC curve 
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Fig. 1. FROC analysis. The four curves show results obtained by different combinations 
of our measures: cont (o), cont/m (+), cont-1 (x) and cont/m-1 (*) 



shown applies to these positive cases. The FP rate changes only slightly if the 
rest of the images are also processed. Intensity resolution was reduced to 8 bits 
after the ANE transform. 



4 Conclusions 

We have analyzed a set of preprocessing transforms with respect to their effi- 
ciency in microcalcification detection. Based on the results shown in Table D we 
may conclude that the iso-precision transform of intensities and the estimation 
of local noise (adaptive noise equalization) have very similar effects, even if they 
are performed one after the other. We have defined three local image features, 
a contrast measure, a second order moment (compactness), and a line detector. 
We have designed these measures specifically for characterizing microcalcifica- 
tion candidates, and our results demonstrated in Figs. Q and |3 show that they 
perform better than other methods which were tested using the same image 
database. The quality improvement is most striking at high true positive rates. 
We also provide test results using a larger set of mammograms. 
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Fig. 2. Comparison of FROC curves obtained for the detection of individual micro- 
calcifications. Our results and other three results are shown based on the same image 
database: Karssemeijer (o), Meersman (-|-), Strickland (x) and our results (*). Broken 
lines show results obtained using our database 
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Abstract. Quantification of regularity of cell sizes and the spatial ar- 
rangement of cells in corneal endotheliums becomes of a great impor- 
tance associated to stress situations such as cataract surgery, corneal 
transplantation or implantation of intra-ocular lenses. A new index of 
regularity of the spatial distribution of cell sizes in corneal endotheliums 
is proposed. The corneal endothelium is described by means of a spatial 
marked point pattern (the cell centroids marked with the cell areas). The 
hypothesis of no dependency between mark and locations is tested by a 
Monte Carlo test. The new index is the p- value of the test validating the 
hypothesis. 

Pairs of endotheliums from different eyes of the same person are com- 
pared in terms of the traditional measures (density, hexagonality and 
coefficient of variation) and the new index. Results show how the index 
proposed can discriminate subtle morphological changes that cannot be 
detected by the commonly used indices. 



1 Introduction 

The deepest part of the human cornea is a single layer of 400,000 to 500,000 
cells called the corneal endothelium. Cells are 4 — 6 /rm in height and 20 ^to. in 
width, and their posterior surfaces are predominantly hexagonal when viewed 
under specular microscopy. This technique is used to study ’in vivo’ the size, 
shape and number of endothelial cells |H| • The normal endothelial cell density is 
3000 to 3500 cells per in young adults. This number decreases by about 
two thirds in elderly patients. The endothelial cell population also decreases 
following stress situations such as cataract surgery, corneal transplantation and 
implantation of intra-ocular lenses. When endothelial loss occurs through aging 
or trauma, the endothelial response is enlargement and sliding of the existing 
cells to cover the area previously occupied by the lost cells. As a result of the 
spreading of the cells, their diameters double their normal size and cells lose 
their hexagonal appearance. When a critical cell density is lost corneal edema 
results, which leads to a pain and poor vision. Some geometrical cell models have 
been proposed which contribute to the study of tissue morphogenesis. Interesting 
general references are I'Zlblvj . 

A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. 418-^23 1999- 
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Three quantities are commonly used to describe the corneal endothelium: cell 
density, hexagonality (percentage of cells with 6 neighbors) and the coefficient 
of variation of cell areas. Cell density is a size parameter (number of cells per 
unit area) meanwhile the other two are measures of variability (spatial regularity 
and variation of the cell areas). The goal of this paper is to analyze the spatial 
distribution of cell sizes and to propose a new index that quantifies its variability. 

One specular image per eye has been obtained with a specular microscope 
(Topcon). A software tool provided by Topcon (Imagenet) was used in order 
to process the original images. Figure E shows some examples. A color CCD 
camera (charge coupled device, XC-711P, Sony, Tokyo, Japan) captured these 
photographs of cells. The output of the camera was fed into a PIP-512/1024 video 
digitizer (Matrox Electronic Systems Limited, Quebec, Canada). All images were 
acquired at the same scale. Preprocessing of the original grey level images was 
designed in order to obtain a labelled binary image in which each cell corresponds 
with a different connected component so that they can be analyzed separately. 
A software tool was developed by using Vista version 2.1.0 

Section 0 proposes a new index that measures the spatial homogeneity of 
the cell areas based on the theory of marked spatial point processes. In Sect. 0 
this new index is compared with the three usual indices. Finally, conclusions are 
discussed in Sect.0 

2 Describing the Corneal Endothelinm 

The aim of this work is to analyze the cell size distribution by taking into ac- 
count the spatial arrangement of cells. We know N cells completely (i.e., non 
touching the frame). The n-th cell is located by its centroid, Sn, and its size is 
measured by its area, Z(sn)- The part of corneal endothelium under study is 
described by means of the points s„’s marked with the areas Z(s„)’s. The set 
{{sn, Z{sn))}n=i & marked point pattern. Different parts of the same corneal 
endothelium produce different marked point patterns. From a probabilistic point 
of view, it can be considered as a realization of a marked point process, i.e., a 
random mechanism that produces a random set of points marked with random 
values. A very good introduction to this subject can be found in 0. Regular 
areas uniformly located within the tissue means from a statistical point of view 
that the (random) areas are independent of the (random) locations. A Monte 
Carlo test of this null hypothesis (random locations are independent of the ran- 
dom marks) is used |ll4Ki| . 

A marked spatial point process is stationary and isotropic if its distribution 
is invariant against translations and rotations of the locations Sn’s. See 0. It 
has been assumed that the observed marked point patterns are realizations of 
a stationary and isotropic marked point process. It can be justified by noticing 
that the images are a small part of the whole endothelium (only 100 to 300 
cells of the total 400000 to 500000 are observed) and the relative position of the 



^ Vista is a public domain library for image processing applications developed by Art 
Pope at the Department of Computer Science, University of British Columbia 



420 



M. E. Diaz and G. Ayala 



microscope and the eye is unknown. In other words, similar results would have 
been obtained by taking any other portion (possibly with a different orientation) 
of the endothelium. By taking this into account, it is natural to choose the mark 
variogram as the functional descriptor of each marked spatial point pattern. The 
mark variogram is defined as 0: 

■,(11 Ml ) = ( 1 ) 

where var(.) denotes the variance, h is a point of the 2-D Euclidean space and 
|j h\\ is its modulus. 

Let 7 i(t) (t = 0, . . . ,tmax) be the mark variogram estimated from the ob- 
served marked point pattern, {{sn, Z{sn))}n=i,... ,n- Under the above null hy- 
pothesis, a similar marked point pattern should be expected when the observed 
areas are randomly interchanged among the given locations. If (71^(1), . . . , ni(N)) 
is a random permutation of (I, . . . , N) then a randomized marked point pattern 
corresponds to {{sn, Z{sTr^(n)))}n=i,... ,n- S permutations are generated {S = 99 
in our examples) . Let 'ji be the estimated mark variogram for the j-randomized 
pattern (i = 2, . . . , S' -I- 1). The question is now: Is 71 (t) similar to 7 i(t) with 
i > 1 and t = 1 , . . . , tjnax'^ Let 



Itit) = 



E 



7j(0 

S '' 



and di = 



+00 






(2) 



for z = 1, . . . , S -I- 1. All rankings of di are equiprobable under the above null 
hypothesis. If denotes the jth largest amongst di, with i = 1, . . . , S -I- 1, 
then under the hypothesis of independence: 

P(di = d(j)) = j = 1,... ,S+1, (3) 

and rejection of the null hypothesis on the basis that di ranks fcth largest or 
higher gives an exact, one-sided test with p — value k/{S + 1). The extension 
to two-sided tests follows directly. The two-sided p — value is then given by the 
expression: 



Pv = 



2k 

5+1 



if k< 



5+1 

2 



and 



2{S-k) 
5 + 1 



otherwise. 



( 4 ) 



From now on, Py is called randomized variogram index. If 71 is similar to the 
7i’s with i = 2, . . . ,5 then a high value p — value, Py, is expected. Lower values 
of Py means that the original and the randomized spatial marked point patterns 
are clearly different. 

The mark variogram (equation 0 has been estimated by using: 



<5 <11 Si Sj \\< t + 5){Z{si) Z(sj))'^ 

= E,.,., <11 + ^ 

An edge correction was not necessary to compare the estimated variogram from 
the original and the randomized marked spatial point patterns, since the same 
bias is introduced for all estimations. 
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3 Results 

How is Py related to cell density, hexagonality and the coefficient of variation? 
A population of 133 endotheliums were analyzed: 49 endotheliums correspond to 
normal control cases and the other 84 images correspond to potentially patho- 
logical eyes. The lowest correlation coefficients are between the density and the 
other three parameters (first row of the Table since it is simply the ratio 
between the number of cells and the total area and does not reflect any kind of 
variability of shape, size or regular spatial disposition of cells. It is important to 
note the high correlation between hexagonality and the coefficient of variation 
of areas, what means that these two parameters describe to some extent similar 
aspects of the image. In contrast, the new index Py has lower correlations with 
the hexagonality and CV. 



Table 1. Correlations between density, hexagonality, CV and Py index 

Hexagonality CV Py 

Density 0.1269723 -0.08751652 -0.009889984 

Hexagonality - -0.61856124 0.248192071 

CV - - -0.134639198 



Table 2. Density, hexagonality, CV and Py corresponding to the selected pairs of 
corneal endotheliums. Rows labeled R (respectively L) correspond to right eye (respec- 
tively left eye) 



Pair 


Density (sq.mm) 


Hexagonality (%) CV (%) 


Pv 


1 (R) 


3084.3 


34 


36.3 


0.08 


1 (L) 


3179.8 


65 


35.3 


0.78 


2 (R) 


2657.1 


43 


42.6 


0.06 


2 (L) 


2527.9 


44 


36.8 


0.00 


3 (R) 


3002.8 


64 


35.0 


0.58 


3 (L) 


2881.7 


52 


31.9 


0.02 


4(R) 


1695.5 


35 


47.7 


0.5 


4(L) 


1927.3 


37 


52 


0.00 



A detailed analysis of 8 endotheliums presenting different pathologies is pre- 
sented. These correspond to 4 patients (4 pairs of eyes). Remember that age has 
a clear influence on the endothelium status. 

In Fig. n (a and b) the endotheliums of a male aged 31 with two intraocular 
lenses are shown. Table El (the two first rows) shows that for both right and left 
eye neither densities nor CV’s are different. Only the hexagonality and the Py 
index show that the left eye’s status is better that the right eye’s one. 

Figure n (c and d) correspond to a 36 year-old patient. Both eyes have un- 
dergone an intervention. TableEl (third and fourth rows) shows similar densities. 
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Fig. 1. Pairs of corneal endotheliums. Each row corresponds to a different patient 
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hexagonalities and CV’s. These values reflect an irregular status of the endothe- 
lium, that is also confirmed by the Pv index. 

The third patient (Fig. QJe and f) has a right eye considered as normal and 
an intraocular lens in the left eye. The three usual indexes show that the status 
of the right eye is lightly better, the value of Pv shows clearly this difference. 

The last patient (eighty four years old) is shown in Fig. 0 (g and h) whose 
both eyes have had cataract surgery. The three commonly used parameters do 
not permit to discriminate these two situations (Table Ej). However, a visual 
inspection shows that the cell patterns are quite different. This difference is 
detected by the Py index (Table El). 

4 Conclusions 

A new index of spatial homogeneity of endotheliums has been proposed, the 
randomized variogram index, Py. Results show how this is able to detect sub- 
tle morphological changes of cell dispositions that can not be detected by the 
usual indices. The Pv index is invariant against scale changes, translations and 
rotations (in fact, robust against image distortions) and it is defined based on a 
Monte Carlo test. 
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Abstract. This paper describes a new ultrasonic scoring system based 
on the texture characteristics of ultrasonic liver images. This system 
generates an ultrasonic disease severity (UDS) score that is highly cor- 
related with the computer morphometry (CM) score obtained from the 
evaluation of liver fibrosis based on the biopsy specimens. Essentially, 
UDS score is very similar to the CM score in the statistical presentation. 
Therefore, UDS score is defined mathematically referring to CM score 
as the scoring basis. As a result, UDS can faithfully reflect the disease 
progression that is determined conventionally based on the evaluation 
of liver fibrosis. Promising results have been obtained in experimental 
studies, and it will currently undergoes extensive clinical experiments. 



1 Introduction 

B-mode liver sonogram, the most frequently used diagnostic ultrasonic modal- 
ity, produces gray-scale images from echo signals arising from pulsed ultrasound 
beams propagating through soft tissues. The ultrasonic scans are highly opera- 
tor and instrument dependent because the characteristics of ultrasonic image are 
closely related to the attenuation and scattering properties. Therefore, current 
liver sonography is still a qualitative, or at best semi-quantitative image modal- 
ity. It depends on the physician to observe certain echotexture characteristics, 
such as texture coarseness, echogenicity and smoothness of inferior edge, from 
the liver images and to compare them in order to diagnose the liver states P . For 
some liver diseases, the diagnostic result does not yet produce a conclusive diag- 
nosis. Therefore, physicians have to further examine with other invasive methods, 
typically liver needle biopsies. Liver biopsy is the standard clinical routine for 
diagnosing chronic liver diseases and for guiding and monitoring treatment, but, 
there is associated morbidity (3%) and mortality (0.03%). Therefore, developing 
a reliable, non-invasive and quantitative ultrasonic scoring system for evaluating 
histological changes in ultrasonic liver images is highly promising in diagnosing 
and monitoring chronic liver diseases. 

A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. 424- HM 1999. 
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The key to establishing the ultrasonic scoring system is to find powerful 
texture features that can reflect the progression of liver disease. From the his- 
tological view, the progression of liver disease mainly reflects in the amount of 
fibrosis of the liver specimens. In the last decade, Knodell’s score was widely 
used to measure liver fibrosis [2|. It used only five numerical scores for staging 
liver fibrosis based on the physicians observation. Obviously, it was not enough 
to develop a quantitative progression index for liver disease. In I5I4I . we proposed 
a quantitative index, called computer morphometry (CM) score, that was more 
reliable and effective than the conventional Knodell’s score for evaluating the 
amount of liver fibrosis. Thus, the CM scores are used here as the criteria for 
selecting powerful texture features from texture descriptors. As mentioned pre- 
viously, the CM score is closely related to the progression of liver disease. Thus, 
it is a good indicator to develop the ultrasonic scoring system for assessing the 
ultrasonic liver images. 

A powerful ultrasonic scoring system should generate the disease severity 
score that matches the corresponding CM score as closely as possible. To estab- 
lish the correlation between the selected texture features and the corresponding 
CM score, the quadratic equations of the selected texture features are defined 
mathematically based on the CM scores in the training stage. The scoring cri- 
terion of assessing the ultrasonic liver image is the minimization of variation 
between the observed texture features and the texture features estimated by 
quadratic equations. The severity scores generated here are called ultrasonic 
disease severity (UDS) score. The intervals of UDS scores in different liver states 
are also determined as the standard for classification. Experiments with forty 
test images demonstrate that the UDS scores generated from this system are 
significantly correlated with the CM scores of corresponding biopsy specimens. 
In addition, one hundred and twenty ultrasonic liver images are used to test the 
classification capability. The resulting correct classification rate was as good as 
86.7%. These results reveal the possibility of replacing the invasive needle biopsy 
examination by the system presented here. 



2 Materials and Methods 

The major morphological change of the progression of chronic liver diseases is 
that collagen fibers are increasingly presented in liver specimens. Therefore, the 
amount of liver fibrosis is a powerful index for quantitatively assessing chronic 
liver diseases. In the literature, echotexture was also reported to be very pow- 
erful for evaluating diffuse liver disease jS|. Thus, if we can establish the cor- 
relation between the measurements of echotexture and the amount of fibrosis 
in liver specimens, sonography will become an effective and non-invasive tool 
in the systematic assessment of chronic liver diseases. In jS|) we found that the 
co-occurrence matrix method and texture feature coding method are powerful 
texture descriptors for classifying the chronic liver diseases. Therefore, these two 
texture descriptors are used to establish the correlates with the pathological 
fibrosis measurement. 
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The disease severity of chronic liver disease is reflected by the amount of 
liver fibrosis of the biopsy specimens. Therefore, it is necessary to develop an 
objective system for measuring the amount of liver fibrosis. In we devel- 
oped an automatic image analysis system, which consisted of a microscope, a 
computer-driven slide-driver, and the software for image acquisition, processing 
and data analysis. The image analysis procedures included color model selection, 
histogram-based normalization, clustering, moment-preserving thresholding and 
a ranking filter for tissue characterization. The computerized motor driver and 
the x-y directional stage were designed and installed to move specimens on an 
optical microscope and to compute the fibrosis index. The system was capable 
of computing the ratio of fibrous area to the complete liver tissue area as an in- 
dex for assessing the amount of liver fibrosis. This index is called the computer 
morphometry (CM) score. In ^ we found that the CM score was superior to the 
conventional Knodell’s score for evaluating the liver fibrosis. The pathological 
CM score is used for selecting powerful texture features. 

In jOj, we found that the changes of ultrasonic liver texture in disease are 
more sensitive with features of the co-occurrence matrix being 3 or 4-pixels 
apart along the angular directions of degree 0 or 90. Thus, we use the two 
displacements along the two directions to obtain four co-occurrence matrices. 
Sixteen texture features can be extracted from these matrices. In addition, four 
texture features by texture feature coding methods are also adopted. Four tex- 
ture features, that are significantly correlated to CM scores are selected as the 
most powerful features for establishing the ultrasonic disease scoring system. 
They are based on grey-level resolution similarity, entropy, correlation, and an- 
gular second moment. Experimental results with forty samples show that the 
resulting severity scores generated from this system are highly correlated with 
CM score more than the ones designed by other texture features. 

2.1 Ultrasonic Disease Scoring System 

In the literature, the texture features were only used to construct a classifica- 
tion system for clustering the three liver states. Widely used texture classifi- 
cation methods including the minimum-distance classifier, Bayesian estimation, 
k-nearest neighboring classifier and neural network have been reported to be use- 
ful in these studies. However, they only classified test samples into three disease 
states. No quantitative measurement of disease severity has yet been generated 
for assessing the progression of the chronic liver disease. As mentioned above, 
liver disease progression can be perceived and evaluated based on the amount 
for liver fibrosis. Among conventional methods for liver fibrosis measurement, 
the CM score is most reliable and accurate method |3|. Therefore, the proposed 
scoring system was designed, using the corresponding CM scores. For this pur- 
pose, a system of quadratic equations is used to define the correlation between 
the texture features and the corresponding CM scores. The design details are 
described as follows. 

Forty training samples, including the ultrasonic images and corresponding 
needle specimens, were used to establish the ultrasonic scoring system in the 
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training stage. The selected texture features and its corresponding CM score of 
the i-th training sample are evaluated and defined as f 2 ,i, fs,i, fi,i) and U. 
The quadratic equations of selected texture features with respect to the corre- 
sponding CM score are defined in (1). 



h,% = cLitf + biU + Cl 
f2,t = a2tj + b2U + C2 
f3,i = + b^ti + C3 

/4,i = 04^1 + b^ti C4 



(t=l,2,... ,40). 



( 1 ) 



These coefficients Uj , bj , cj are determined using least square estimation based 
on (2). 
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The quadratic equations defined in (2) are used to derive a disease severity 
score for assessing the ultrasonic liver image. The resulting score is called ultra- 
sonic disease severity (UDS) score. The assessment criterion of image X is based 
on the minimization of the square error between the texture features of X and 
the estimated texture features obtained by the quadratic equations. The square 
error term is defined as (3). 



4 

SE = Y^[gj-{ajU^ + bjU + Cj)]^ (3) 

i=i 



where the ( 51 , 52 , 53 , 34 ) are the texture features of X. 

Differentiating of (3) with respect to the variable u, we obtain the root u of 
(4) to determine UDS such that the square error term is minimized. 

(Ei=i ajCj + x;^=i b]-2 x;^=i u+ (4) 

(X)4=i bjCj + X)j=i + X)i=i 9jbj'j = 0. 



3 Experimental Results and Conclusion 

In this study, we have successfully developed an ultrasonic scoring system to as- 
sess the severity of the chronic liver disease. The system integrates the techniques 
of texture analysis with pathological CM score measurement. In this system, all 
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programs are coded in Visual C++ version 4.0 with a Pentium personal com- 
puter under MS-Windows 95. The system provides user-friendly interfaces and 
efficient computation for real-time clinical evaluation. Forty training samples 
with ultrasonic images and corresponding needle specimens were collected from 
forty patients in whom thirteen of them are normals, nineteen are chronic hep- 
atitis and eight are with liver cirrhosis. These training samples were used to 
select powerful texture features, and then to establish the quadratic equations 
for texture features based on CM scores in the ultrasonic scoring system. The re- 
sulting quadratic equations are used to derive the UDS scores of liver images for 
sequential assessment. Additionally, in the conventional clinical diagnosis, physi- 
cians always classify the ultrasonic liver image into one of three liver states. To 
provide the standards for classification, the forty training images are also used 
to determine the severity intervals of UDS score for different liver states accord- 
ing to their medical records. The intervals of UDS scores in different liver states 
are determined by ANOVA and correlation analysis. The results are: normal 
2.8832 ± 1.668, hepatitis 5.9296 ± 1.554 and liver cirrhosis 13.8257 ± 2.632. The 
thresholds of UDS for three different disease states are 4.54 (normal ~ hepati- 
tis) and 9.62 (hepatitis ^ cirrhosis) based on normal distributions with equal 
standard deviation. 

Forty ultrasonic images and their corresponding needle specimens are used as 
test samples to analyze the stability and accuracy of the proposed scoring system. 
The accuracy of the UDS score is verified by comparing with the pathological 
CM scores. The Pearson correlation coefficient between the UDS scores and CM 
scores is 0.8843 {p < 0.001) This significant correlation shows that the proposed 
UDS scores can faithfully reflect CM scores which is an important factor in 
assessing the progression of chronic liver disease. In other words, the UDS score is 
a powerful and stable index for assessing the ultrasonic liver images. The results 
also reveal that the system of quadratic equations is an appropriate method for 
correlating the selected texture features and the corresponding CM scores. It is 
still an interesting topic to define the best correlation between texture features 
and CM scores in the future such that the resulting ultrasonic system is most 
effective. 

In clinical diagnostic practice, the ultrasonic liver images are usually classified 
into three disease states. The effective scoring system should avoid misclassifi- 
cation, especially false-negative misclassification. The false-negative rate is the 
probability of a misclassification such that the patients are classified as being 
normal or having mild disease while the actual diagnosis is a more severe disease. 
High false-negative rate represents a danger to the patients when physicians use 
this scoring system. One hundred and twenty ultrasonic liver images, verified by 
needle biopsy, are used to test the discrimination capability. The classification 
results are listed in Table 1. From the experimental results we find that the 
negative-false rate is only 8.33% and the correct classification rate is 86.7%. It 
is superior to the conventional method utilized by the co-occurrence matrix or 
texture feature coding method jSj- 
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In this paper, a quantitative ultrasonic scoring system is proposed based on 
the characteristics of echotexture of liver. The system not only generates quanti- 
tative indices to assess disease progression but also to classify the ultrasonic liver 
images. It is shown that the proposed system has potential to become a valuable 
clinical tool for liver diagnosis in the future. Several characteristics of liver tissues 
have been used to evaluate the degree of diffuse parenchyma liver disease, includ- 
ing the smoothness of liver surface, echogenicity, echotexture and backscattering 
parameters. However, the system proposed here only uses echotexture informa- 
tion. In further studies, it is an interesting topic if one can enhance this system’s 
performance by integrating other features of tissue characteristics. 



Table 1. The confusion matrix of forty test patients is shown. The left column indicates 
the true liver states of the test samples while the upper row indicates the corresponding 
classification results. Correct classification rate is 86.7%. The false-negative rate is only 
8.33% 
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Normal 


37 
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32 
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Abstract. Automatic volumetric measurements of brain structnres and 
substructnres is a prerequisite for longitudinal stndies as well as stud- 
ies aimed at measuring and quantifying differences between popnlations. 
This stndy tests the hypothesis that a fully automatic, atlas-based meth- 
od can be used for the computation of the volume encompassed by the 
dura, the volume of the brain, and the volume of the cerebellum from 
which indices of atrophy are estimated. The method has been tested 
on normal volunteers and alcoholic patients. It has been validated both 
by comparing contours obtained manually and automatically and by re- 
peating the measurements on serial acquisitions. Results demonstrate 
that the method is both robust and accurate, even in the presence of 
large morphological differences due to severe atrophy caused by chronic 
alcoholism. 



1 Introduction 

A number of atlas-based methods have been proposed in the recent past to 
label and segment structures and substructures in medical images These 
techniques involve the segmentation of a reference volume and its non-rigid reg- 
istration to the volume to be segmented. Possible approaches include the use of 
landmarks in which the deformation is computed based on control points and 
interpolated through the remainder of the volume. But, the automatic or semi- 
automatic identification of these control points remains challenging. Other tech- 
niques attempt to maximize intensity similarity on a voxel-by-voxel basis. These 
methods have the advantage of being fully automatic but they may be affected 
by large morphological differences between brain volumes. Results reported in 
the literature typically involve normal subjects or patients with pathologies that 
do not drastically alter the shape of the brain, such as schizophrenia or epilepsy. 
In these applications, small deformations are sufficient to warp one brain onto 
the other. In contrast, the study presented herein involves chronic alcoholics with 
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very severe brain atrophy. Severe atrophy considerably reduces the size of the 
cerebellum and enlarges the sulci and the ventricles. This decreases the similarity 
between the atlas and the subject volumes, thus challenging deformation algo- 
rithms. This work tests and evaluates the robustness of an automatic method 
for the computation of pre-atrophy brain volumes and the post-atrophy brain 
and cerebellum volumes. 



2 Methods 

2.1 Data Sets 

Seven normal volunteers and seven patients with a history of alcoholism were 
used in theis study. Multiple 3-D magnetic resonance (MR) image volumes were 
obtained of each subject. Normal subjects were scanned three times within a 
period of three weeks (n=5) or within a period of 5 months (n=2). Alcoholic 
subjects were admitted to a detoxification program, and the first scan was ob- 
tained within 5 days of abstinence. The second scan was obtained within one 
month, followed by a third scan at approximately 3 months after the first scan. 
An additional image volume obtained with the same imaging parameters was 
used as an atlas. All image volumes were acquired with a General Electric 1.5 
Tesla Signa MR scanner using a spoiled gradient echo pulse sequence. Each vol- 
ume consists of 124 sagittal slices, and each slice has dimensions of 256 x 256 
pixels. Voxel dimensions were .94 x .94 x 1.3 mm^. 



2.2 Image Registration 

The registration algorithm consists of two major steps. First, a seven-parameter 
(three rotation angles, three translation vectors, and one scaling factor) trans- 
formation that brings the two volumes into global correspondence is computed. 
Next, the volumes are deformed using a non-rigid transformation to bring these 
two volumes into local correspondence. Both of these steps are fully automatic. 
Because the method used in step (2) is also used to compute the transformation 
in step (1) the local transformation method is described first. All the algorithms 
used in this study were written in IDL (Interactive Data Language, Research 
Systems, Inc.) and executed on a Sun Ultra 1 workstation (Sun Microsystems, 
Mountain View, CA). 

Local Registration: Recently, Thirion |2| presented the problem of image 
matching in terms of demons (by analogy with Maxwell’s demons). This is a 
general framework in which object boundaries in one image are viewed as semi- 
permeable membranes. The other image, considered as a deformable grid, diffuses 
through these interfaces driven by the action of effectors (the demons) situated 
within the membranes. Various kinds of demons can be designed to apply this 
paradigm to specific applications. In the particular case of deformations based 
on voxel-by-voxel intensity similarity the demons paradigm is similar to opti- 
cal flow methods. It is an independent implementation of this approach that 
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has been used in this study jS|. This algorithm results in a deformation field 
(i.e., a displacement vector for every voxel in the volume) that can be used to 
warp one image onto the other. Global Registration: Prior to applying the de- 
formation algorithm, the images to be matched are brought into approximate 
correspondence using a seven-degrees-of-freedom transformation. Displacement 
vectors computed as described in the previous section were used to identify a 
set of points in the first image and a corresponding set of points in the second 
image. These homologous points are then used to compute the global transfor- 
mation. Typically, the global transformation computed with this approach is not 
as accurate as one computed with other methods, such as mutual information. 
However, it has the advantage of being fast and is sufficiently accurate to serve 
as a reliable starting point for the deformation algorithm. 

2.3 Segmentation 

The atrophy indices of interest require pre- and post-atrophy brain volumes as 
well as cerebellum volume. Pre-atrophy brain volumes are difficult to obtain, so 
instead, the intra-dural volume was used as the reference to which brain volumes 
are compared. The intra-dural volume in the atlas was determined by careful 
manual delineation. Contours were outlined in each slice of the sagittal volume, 
and a binary mask of the intra-dural volume was created. This same method was 
repeated to obtain a binary volume of the cerebellum (both hemispheres) in the 
atlas volume. The region was segmented to include the entire cerebellum region, 
and individual folia were not followed. Note that the first volume also included 
the cerebellum. In order to segment the intra-dural region and the cerebellum in 
subject volumes, the atlas was first registered to each volume. The deformation 
field was then applied to the binary atlas volumes to create intra-dural and 
cerebellum masks in each individual volume. 

2.4 Volume Measurements 

The intra-dural brain volume of each subject is determined simply by the volume 
of the mask created by projecting the atlas mask onto each individual volume. 
The brain volume (white and gray matter) is obtained by thresholding the intra- 
dural image to eliminate cerebrospinal fluid. The threshold value was manually 
chosen in the atlas volume. In order to compensate for inter-scan intensity vari- 
ations, this threshold level was automatically adjusted to the proper value for 
subject volume using a histogram equalization technique. This threshold was 
then applied to the segmented intra-dural images, and a brain volume was de- 
termined. The cerebellum volumes were computed in the same manner as the 
brain volumes, using a separate intensity threshold. 

3 Results 

Figure 1 illustrates qualitatively the type of results that were obtained. The 
left panel shows one slice in the atlas volume. The right panel shows the slice 
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with the same index in one of the patient volumes. Observe the large amount 
of atrophy (enlarged sulci and ventricles and atrophied cerebellum) visible in 
the patient volume. The middle row shows the slice with the same index in the 
volume obtained by warping the normal brain volume onto the atrophied brain 
volume. After deformation, the ventricles in the normal brain volume have been 
dramatically enlarged, the thickness of the corpus callosum has been reduced, 
sulci have been enlarged, and the overall shape of the head has been modified, 
but the integrity of the cortical structures has been preserved. Figure 2 illus- 




Fig. 1. Results of the elastic registration algorithm 



trates representative results for the automatic segmentation of the cerebellum. 
This figure shows one slice in each of three alcoholic subject volumes with the 
cerebellar contours obtained with the automatic technique overlaid in white. 
Observe the ability of the algorithm to produce accurate results even when the 
shape and orientation of the cerebellum varies greatly from one volume to the 
other. To evaluate our results quantitatively we differentiate between repeata- 




Fig. 2. Automatic cerebellum segmentation results for three alcoholic subjects 



bility and accuracy. The data set used in this study includes three acquisitions 
per subjects (both for the normal and the alcoholic volunteers). This permits 
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the evaluation of the consistency and repeatability of our measurements. Indeed, 
changes are not expected in the volume encompassed by the dura in either the 
normal or the patient population and only minor changes (if any) are expected 
in the brain and cerebellar volumes for the normal population. Changes related 
to abstinence may be observed both in the brain and the cerebellum volumes for 
the patient population. Consistent values for structures that are not expected 
to change in serial scans of the same subject are thus good indicators of the 
reliability of our measurements. Accuracy has been assessed by comparing the 
results obtained automatically to results obtained by manual delineation. 



3.1 Repeatability 

Figure 3 shows the intra-dural volumes obtained for both the normal and the 
patient population. For each subject the figure shows the volume computed for 
each acquisition. Space restrictions preclude the inclusion of similar figures for 
the cerebellar volumes but results were comparable. 
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Fig. 3. Intra-dural volumes, determined automatically, for each normal and alcoholic 
subject used in this study 



3.2 Accuracy 

For each and every volume, four slices were segmented manually (two for the 
brain and two for the cerebellum). These slices were chosen by determining the 
range on which the structures were visible in the image volumes and randomly 
selecting two slices per structure within this range. Ranges, and therefore se- 
lected slices, were different for the cerebellum and for the brain. The similarity 
between contours obtained manually and contours obtained automatically were 
computed using a similarity index S derived from the kappa statistic 0. This 
index varies between 0 and 1 (1 indicates perfect agreement between two con- 
tours while 0 indicates no overlap) and is sensitive to both differences in size and 
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structure orientation. This strategy resulted in 84 brain and cerebellum contours 
for the normal population and 84 contours for the alcoholic subjects. The mean 
similarity indices for the normal subjects were 0.98 for the intra-dural volume 
and 0.95 for the cerebellar volume. These indices were 0.97 and 0.94 for the 
alcoholic subjects. 

4 Discussion 

This study demonstrates that fully automatic, robust, and accurate segmenta- 
tion of the whole brain and cerebellum can be accomplished using atlas-based 
methods. To the authors’ knowledge, this is the first time that results have been 
reported on a study involving atlas-based segmentation of brains with patholo- 
gies that alter brain morphology to the extent observed in this data set. Intra- 
dural volumes demonstrate the excellent repeatability of the results. Accuracy 
was tested by comparing contours delineated manually and contours delineated 
automatically. Arguably, manually delineated contours are not the ultimate gold 
standard. But, in this case the contours were drawn by the same rater on the 
atlas and on each individual slice used for the evaluation. The entire atlas was 
also delineated twice and similarity indices of 0.98 and 0.96 were observed for 
the intra-dural and cerebellum volumes, respectively. The average similarity in- 
dices we have observed between manual and automatic delineation on the slice 
selected for evaluation are therefore comparable to the intra-rater variability. 
Thus, results obtained on this data set support the hypothesis that automatic 
delineation is as reliable and accurate as manual delineation when the manual 
segmentation is performed by a single individual. 
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Abstract. Slow rotation acquisition of dynamic data has several ad- 
vantages over fast rotation acquisition which is currently the method of 
choice used for the acquisition of dynamic data in SPECT. Slow rota- 
tion is currently not used because of error from inconsistent data. In 
this work, we develop a method of reconstructing from projections that 
are inconsistent in time due to being acquired during a slow acquisition. 
Our method is based on a factor model of physiological data. A series 
of dynamic images are reconstructed, where each reconstructed image 
corresponds in time to only one projection. Such an under-determined 
reconstruction is shown to be possible through utilization of a factor 
model. Computer simulations are performed using simple phantoms. We 
found that we are able to accurately reconstruct the dynamic sequence for 
simple phantoms with temporal behavior corresponding to teboroxime- 
Tc-99m heart imaging. 



1 Introduction 

Single Photon Emission Gomputed Tomography (SPEGT) can be used to acquire 
dynamic data. The acquisition protocol is usually based on the use of a fast 
camera rotation. Such acquisitions are made only with multiple detector cameras 
which have the ability to rotate quickly and acquire consecutive, complete sets 
of tomographic projections. A complete set of tomographic data is acquired 
over a very short period of time (approximately 10 seconds) and the resulting 
number of counts in the acquired projections is very low. The tomographic sets 
are reconstructed to form a series of dynamic images which are very noisy due 
to low projection counts. The assumption made during reconstruction is that 
radionuclide distribution remains constant during the acquisition of one set of 
projections. This approximation may be unreliable, especially in the time just 
after injection when changes of activity in the object are very fast. In our lab, 
with a Picker 3000XP, the best temporal resolution with fast rotation was 5.7 
seconds. Another important aspect of fast rotation acquisition is the amount 
of computer time and disk storage needed to process a dynamic study. Each 
tomographic set of projections must be stored end then reconstructed. For the 
above reasons, dynamic SPEGT with fast rotation is a difficult and computer 
time consuming method. 

There are a variety of methods in PET and SPEGT which estimate kinetic 
parameters directly from projections. They require prior reconstruction of to- 
mographic sets in order to estimate the object boundary | 2 |, or to estimate the 
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initial set of factors through SVD analysis of reconstructed dynamic sequences 

0 - 

In this paper we consider a different approach to dynamic SPECT. In this 
approach the acquisition of the dynamic SPECT data is similar to the standard 
static acquisition of SPECT data Only a small number of rotations of the 
camera is required during the scan. Such an acquisition type creates time incon- 
sistencies in the projections, i.e. each projection “sees” different activity in the 
object. 

We propose a reconstruction technique which reconstructs a sequence of dy- 
namic images from these inconsistent projections using a factor model of the 
physiological images [dftij . Each reconstructed dynamic image corresponds in 
time to only 1 projection unlike in the fast rotation case where one dynamic 
image corresponds to a tomographic set of projections. By using a short time 
per projection, this method can provide a much better temporal resolution than 
that obtained with fast rotation. The very important advantage of this method 
is that it does not require a three or more detector system; it can be used with 
a two detector or single detector system. Only positivity constraints are put 
on the temporal or geometrical representation of the factors, and no a priori 
information is used in the reconstruction. 

The time activity curves (TACs) can be determined from the sequence of 
reconstructed images by using region of interest (ROI) measurements or factor 
analysis of dynamic structures (FADS) 0. We used the ROI technique for the 
extraction of TACs which then were used for the evaluation of the reconstruction 
method presented in this paper. 

We present the results of our reconstruction method from a simulation of a 
simple phantom. A comparison is made between two different types of acquisition 
protocols and two different reconstruction parameters. 



2 Methods 

The reconstruction was done by constructing a least squares objective function 
where forward projection was modeled assuming a factor model of the data: 



/(C,F) 



M,K 

E 

j:t = l 



(Erp=i«ji(t)-Cip-Fp(t)-Pj(t))^ 

Pj(t) 



( 1 ) 



where Fp(t) is a value of factor p at time t and Cip is a geometrical definition of 
the factor; i is a pixel index. The a is a tomographic system matrix, and Pj(t) is 
the number of counts measured in bin j at projection (or time when projection 
was taken) t. 

The minimization of (1) will yield the values of C and F, but these values are 
not physiologically meaningful since: (a) in general, the number of factors used 
for the forward projection in (1), P, is different than the number of physiolog- 
ical factors and (b) the results of the minimization, matrices C and F, are not 
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mathematically unique. In all simulation experiments presented in this paper, 
the number of factors used in the reconstruction was equal to or higher than 
the number of physiological factors in the analyzed study. The physiologically 
meaningful result of this method is the dynamic sequence of images which is a 
result of C • F multiplication. Although in this paper we consider only a 2D case, 
this method has straightforward extension to 3D. All equations in this paper are 
valid for this 3D case. The objective function was minimized by use of the con- 
jugate gradient method. The non-negativity constraints were imposed on C and 
F by adding to the objective function a term which penalized negative values of 
these matrices. 

Preliminary computer simulations were performed in order to verify the re- 
construction method. A simple phantom consisting of 4 squares. These squares 
corresponded to blood, myocardium, liver, and right ventricle. Their geometrical 
representation is presented in Fig. 1(a) . In all simulations, uptake of Teboroxime- 
Tc-99m in the myocardium was simulated using a two compartmental model Q 
with wash-out k \2 = 0.4min“^ , and wash-in ^21 = 0.8min“^, and fraction of 
blood in the tissue fv = 0.15. Simulations were performed in 2D using 64x64 
pixel images. Simulations were performed without noise and with Poisson noise 
added to the projection data. The total number of counts in each sinogram for 
the simulation with noise was equal to 3.8x10®. Ten realizations of the noise in 
the sinograms were performed. The data acquisition was performed assuming 
two detector heads positioned with a relative angle of 90 deg; 183 projections 
per head was simulated. Each head made a total of 3 rotations during the ac- 
quisition. The projections were generated in a 64 bin matrix. Simulations with 
noise were performed in 2 modes of acquisition. In the first mode, the time per 
projection, 6 seconds, was the same for all projections. For the second mode, 
the time per projection was 2 seconds for the first 61 projections, and it was 
increased to 6 seconds for the next 61 projections, and the final 61 projections 
lasted 10 seconds each. The non-uniform time per projection acquisition mode 
was used to increase the temporal resolution at the beginning of the acquisition 
when changes in radionuclide distribution were most rapid. 

The reconstructed dynamic sequences had 183 images (the same as the num- 
ber of projections), each of size 64 x 64 pixels. The TACs for the 4 different com- 
ponents: myocardium, blood, liver, and right ventricle, were extracted from re- 
gion of interest (ROI) measurements from the reconstructed dynamic sequences. 
Geometrically, the ROIs were defined as the 8x8 squares positioned in the 
center of each of the 4 components. The kinetic parameters of the simulated 
uptake of teboroxime in the heart were also calculated using the RFIT fitting 
program [S| from the obtained TACs. Parameters (fci 2 , ^ 21 , fv) of the compart- 
mental model, and their standard deviations, were calculated for each ROI. The 
standard deviations were calculated based on the 10 realizations of noise in the 
projections. 
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3 Results 

The results of the simulations with noise gave the exact match between the 
simulated and obtained by the method curves. These results are not presented. 
Figure 1 shows sinograms of the noisy square phantom. The time per projection 
was 6 seconds for each projection in sinogram (b) and varied from 2 seconds to 
6 seconds to 10 seconds in the three sections of sinogram (c) . It is apparent from 
the sinograms that for the first frames there are rapid changes of activity which 
become smaller for later time frames. Different times per projection causes the 
discontinuity of the sinogram seen in Fig. 1(c). 
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Fig. 1. Simulated object (a). The sinograms for one detector obtained for uniform (b) 
and non-uniform (c) temporal sampling. The reconstructed images from sinogram (b) 
are presented in (d). Only 12 out of total 183 reconstructed images. Images in (d) 
correspond to times marked on the sinogram (b) by arrows 



The reconstructions from the sinograms in the Fig. 1(b) is presented in Fig. 
1(d). Only a small number of reconstructed images is shown (there were a total of 
183 reconstructed images). Images in Fig. 1(d) correspond to projections marked 
by arrows on the sinogram in Fig. 1(b). 
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Fig. 2. TAG obtained by ROI measurments from reconstructed series of images. The 
temporal behavior of the factors corresponded uptake of Teboroxime-Tc-99m in heart, 
(a) corresponds to myocardium, (b) to blood, (c) to liver, and (d) to right ventricle 



Figure 2 shows the comparison between the simulated curves and the TACs 
obtained using our method. The ROI curves were obtained from dynamic se- 
quence reconstructions from one noise realization of the projection data. Figure 
2 presents results for non-uniform temporal sampling. The values of TACs in 
Fig. 2 were scaled, i.e. the values of the TAG for the projections with 2 second 
duration were multiplied by 3, and values corresponding to 10 second duration 
were divided by 1.67, so that all projections corresponded to 6 second duration. 

Parameters of the kinetic model were calculated for each noise realization of 
the data from the TACs of the myocardium and blood. The results are summa- 



Table 1. Calculated kinetic parameters with standard deviations 





fei 2 [min 


fc 2 i [min 


fv 


Simulated 


0.40 


0.80 


0.15 


Uniform 

temporal sampling 


P=4 


0.399 ±0.029 


0.802 ±0.061 


0.097 ±0.023 


P=5 


0.392 ± 0.034 


0.785 ±0.059 


0.093 ±0.023 


Non-uniform 
temporal sampling 


P=4 


0.390 ±0.017 


0.789 ± 0.034 


0.128 ± 0.024 


P=5 


0.388 ±0.021 


0.780 ± 0.047 


0.117 ± 0.029 
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rized in Table 1. The standard deviation of wash-in and wash-out parameters 
and bias of /„ was decreased by using non-uniform sampling. There is no visible 
difference between the reconstructions with P = 4 and P = 5. 

The values of the calculated kinetic parameters agreed with simulated values 
within the standard deviation. Standard deviations were calculated over multiple 
noise realizations of the projection data. Use of non-uniform temporal sampling 
improved the temporal resolution of the dynamic acquisitions and often improved 
the precision and accuracy of kinetic parameters obtained (Table 1). 

In future studies we plan to investigate the use of different methods for 
minimization of the objective function. We intend to optimize the slow rotation 
acquisition protocol using computer simulations with a more realistic anatomic 
phantom. Finally, experimental validation of this method will be performed for 
teboroxime-Tc-99m heart and MAG3-Tc-99m renal studies in animals and in 
patients. 
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Abstract. In nuclear medicine, simultaneous dual-isotope imaging is 
used to determine the distribution of two radiotracers from a single ac- 
quisition and for emission/transmission (E/T) imaging in SPECT. How- 
ever, no general solution to the cross-talk problem caused by scattered 
and unscattered photons has been found yet and accurate quantification 
cannot be performed. We describe a general method of spectral factor 
analysis (SEA) for multi-isotope acquisitions. SEA corrects for cross-talk 
due to unscattered and scattered photons in planar or SPECT imaging 
involving two or more radiotracers and for E/T scans. A Tc-99m/I-123 
phantom study shows that quantitative accuracy is within 10% with 
SEA, while errors up to 170% are observed using conventional spectral 
windows. 



1 Introduction 

In nuclear medicine, simultaneous dual-isotope imaging is used to determine 
the distribution of two imaging agents labeled with two different isotopes (e.g., 
P2| ) and also for simultaneous emission/transmission (E/T) imaging in SPECT, 
where one radioisotope is used for transmission scanning while the other is used 
for the emission study |3|. The major problem with simultaneous dual-isotope 
acquisition procedure is the cross-talk between the two isotopes. Photons emit- 
ted by one radioisotope can be detected in the energy window dedicated to 
the acquisition of photons emitted by the other and conversely. Cross-talk can 
be caused by unscattered photons if the photopeaks corresponding to the two 
radioisotopes partially overlap. Cross-talk is also systematically introduced by 
scattered photons from the highest energy isotope which are detected in the 
energy window corresponding to the lowest energy isotope. The magnitude of 
cross-talk varies with the experimental conditions but it is admitted that the 
resulting images are not trustworthy without some cross-talk correction 

There is currently no method accepted as a standard for cross-talk correc- 
tion. Symmetrical and off-set energy windows are used (e.g., PEI) to reduce 
cross-talk but do not remove it. Subtraction methods involving at least three 
energy windows have also been proposed (e.g., m)- However, none of these 
approaches offers a reliable solution when cross-talk is caused by both scattered 
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and unscattered photons. In addition, these empirical approaches need substan- 
tial changes and specific calibration for each combination of isotopes. 

We describe here a general method for the analysis of multi-isotope acquisi- 
tions using a spectral factor analysis (SFA). SFA corrects for cross-talk due to 
both unscattered and scattered photons. 



As different radioisotopes can be distinguished by their emission energy spec- 
trum, SFA analyzes the set of spectra detected in the pixels of the planar images 
(or projections in SPECT), using either a list mode or a multispectral acquisi- 
tion technique. For the sake of simplicity, we consider here planar imaging (the 
extension to SPECT is discussed below). A planar acquisition with spectral in- 
formation consists of a set of E spectral images, each image including photons 
detected in a small energy interval. Xi(e) is the number of photons detected in 
pixel i of image e. 

The model assumes that each noise-free spectrum can be written as a linear 
combination of K spectral components fk common to all pixels i, i.e.: 



where ak{i) is the number of photons in pixel i distributed according to the 
spectrum fk and £i(e) represents noise. 

For multi-isotope imaging with R isotopes, the spectral components fk are 
R scatter-free spectra fr and K — R scatter spectra. For each isotope r, the 
{or.(i)} coefficients (i = 1, . . . ,N,Nis the number of pixels in an image) associ- 
ated with the scatter-free spectrum fr give the scatter-free image of isotope r. 
Solving the model consists in estimating the scatter-free and scatter spectra fk 
and the associated afe(z). This is performed using SFA, derived from the latest 
developments regarding factor analysis of medical image sequences m In the 
following, we briefly describe the four steps of SFA. 

Data preprocessing. First, the spectra corresponding to spatial neighbor 
pixels are added (e.g., using 4x4 pixel non overlapping ROIs), which is equivalent 
to a coarse spatial sampling. This reduces the number of spectra to be analyzed 
and increases the signal-to-noise ratio in each spectrum. Spectra corresponding 
to irrelevant regions in the images are also discarded, resulting in M spectra Y^. 
The model (1) can be written: 



2 Theory 



K 




( 1 ) 



K 




( 2 ) 



where the {a(.(f)}j=i^... is the image (with coarse sampling) associated with 
the spectrum fk and e((e) represents noise. 
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Orthogonal analysis. This stage filters the spectra Yj, to estimate their 
noise-free components Yi assuming these components belong to a low dimen- 
sional space S (typically < 5D). S is estimated using an orthogonal decomposi- 
tion adapted to the Poisson nature of the set of spectra namely 

a correspondence analysis (CA). CA yields an orthogonal spectral basis from 
which a Q-dimensional space S, spanned by the Q eigenvectors associated with 
the largest Q eigenvalues of the covariance matrix decomposed by CA, is ob- 
tained jS!- 

Oblique analysis. The oblique analysis estimates the spectra fk underlying 
the model (1) assuming they belong to the subspace S. It is also assumed that 
the dimension Q of S' is equal to the number K of spectra underlying the physical 
model. To estimate the fk, a priori knowledge pertaining to the spectra fk and to 
the images aj. must be used j7]. We know that fk{e) > 0 and > 0 since they 
represent numbers of photons. In addition, for each scatter-free spectrum /r(e) = 
0 for some energy channels where there is no photopeak. Using this information, 
the R scatter-free spectra fr are first located in S using the target apex-seeking 
(TAS) method IILI . Next, the K — R scatter spectra fk are estimated iteratively 
by minimizing the number of negative /fe(e) and values while taking into 

account the confidence interval around each estimated /fc(e) or 0. 

Oblique projection. An oblique projection finally determines the coeffi- 
cients a/c(j) of equation (1) given the original spectra and the estimated spec- 
tra fk jHl- The set of coefficients {or.(*)}i=i^,,, corresponding to the scatter-free 
spectrum fr gives the scatter-free image of the isotope r. 

3 Material and Methods 

The phantom (Fig. 1) consisted of 2 series of 9 overlapping Petri dishes (?=8.6cm, 
1.3 cm thick), including various mixtures of 1-123 (emission energy of 159 keV) 
and Tc-99m (emission energy of 140 keV) in water (Table 1). 




I — I Tc-99m 
H 1-123 



Fig. 1. Phantom used for the acquisition 

A planar view of the phantom gave an image of 9 dishes with variable mix- 
tures of Tc-99m and 1-123. The total Tc-99m and 1-123 activities were 23.1 and 
24.8 GBq respectively. A 20 min acquisition (6.45 million counts) was performed 
on a Elscint Helix gamma camera, equipped with a low energy high resolution 
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collimator, using 32 spectral images (3.5 keV wide each) with a matrix 256 x 256 
(pixel size = 1.47 mm) between 63 and 175 keV. 

Table 1. Percentages of Tc-99m and 1-123 activity in each dish of the phantom 



dish number 


1 


2 


3 


4 


5 


6 


7 


8 


9 


percentage of Tc-99m 


36 


32 


22.6 


15 


0 


68.3 


83.5 


89.7 


0 


percentage of 1-123 


64 


68 


77.4 


85 


0 


31.7 


16.5 


10.3 


0 



The resulting 32 images were processed using SFA: 8x8 pixel grouping, 
TAS of the Tc-99m photopeak assuming it was zero between 63 and 126 keV 
and between 154 and 175 keV and TAS of the 1-123 photopeak assuming it was 
zero between 63 and 143.5 keV. A scatter spectrum was estimated using non- 
negativity constraints only. The SFA cross-talk free images were compared to 
the Tc-99m and 1-123 images obtained using “optimal” energy windows mi: a 
15% window centered on 140 keV (129.5-150.5 keV) for Tc-99m and a 154-175 
keV window for 1-123 (called WIN images below). 

The Tc-99m and 1-123 images were analyzed by drawing circular ROIs inside 
each dish (?=4.5 cm). The mean number of counts inside each ROI was calcu- 
lated. Using the Tc-99m (resp. 1-123) image, the dish with the largest mean 
number of counts A^xcmax (resp. Aimax) was identified and, for each dish d, the 
ratio of the mean number of counts iVxc-d (resp. fVpd) in the dish d to A^xcmax 
(resp. A^imax) was determined. These ratios Axc-d/AA’cmax and A^i-d/Aimax rep- 
resent the activity ratios (AR) between different regions in the Tc-99m and 
1-123 images. In each dish d, the AR A^xc-d/-^i-d was also determined. All AR 
were compared to their true values theoretically derived given the real activity 
in the dishes and the attenuation effect. As this was planar imaging, no absolute 
quantitation was attempted. 

4 Results 

The spectra (Fig. 2) estimated using SFA and the location of the spectral win- 
dows used for WIN as defined above show that, when using WIN, cross-talk in 
the Tc-99m window is due to scattered photons and unscattered 1-123 photons 
and that some Tc-99m unscattered photons are outside the Tc-99m window. 
On the other hand, cross-talk in the 1-123 image is mostly due to scattered 
photons. WIN 1-123 window also rejects many 1-123 unscattered photons. 

Figs. 3a-b show the Tc-99m and 1-123 AR measured in the different dishes 
for the estimated Tc-99m and 1-123 images. Using WIN Tc-99m image, errors 
up to 81% (ROI 3) and 170% (ROI 4) were observed for low A^xc-d/.^Tcmax values 
(22.5 and 11.0% respectively). With the SFA Tc-99m image, the largest errors 
observed for A^xc-d /A^Xcmax AR were 4.4% and 5.8% for ROIs 6 and 8 where the 
true AR were 73.2% and 87.8% respectively. 

The differences in performance between the methods where less obvious for 
the 1-123 images, with errors between 1.5% (ROI 6) and 9.7% (ROI 5) for WIN, 
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Fig. 2. Spectra estimated using SFA and spectral windows used in WIN 



and between 0.8% (ROI 9) and 9.3% (ROI 7) for SFA. The 1-123 AR measured 
in cold dishes 5 and 9 were < 1.5% with SFA and they were between 4.5 and 
11.8% with WIN. 




Fig. 3. Relative quantitation results from the WIN and SFA Tc-99m and 1-123 images 



Fig. 3c shows the estimated AIxc-d/A^i-d AR, the WIN images yielded an 
overestimation of the AR for the lowest AR and an underestimation for the 
highest AR, with errors between +13.9% (ROI 4) and -9% (ROI 8). SFA images 
gave errors between -1.5% (ROI 7) and +4.4% (ROI 1). 

5 Discussion and Conclusion 

Simultaneous dual-isotope studies are currently hindered by cross-talk prob- 
lems, for which there are no satisfactory solutions yet The SFA method 
offers a general solution, since it can be used a priori for any radioisotope combi- 
nation, both for studies involving two radiopharmaceuticals and for E/T studies. 
SFA is a data driven approach and the severity of cross-talk does not have to be 
known a priori. However, as the linear model underlying SFA is quite general, 
a priori knowledge must be used to find the solution appropriate to the physics 
of the problem. This a priori knowledge relates to the energy range in which 
the photopeaks should be zero and does not have to be extremely precise: a 
change of few keV in the definition of this energy range (up to 10 in our ex- 
ample) did not affect the results. SFA corrects for cross-talk due to scattered 
and unscattered photons. SFA takes advantage of the Poisson nature of the data 
when filtering the noise (in the orthogonal analysis) and when estimating the 
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model components (in the oblique analysis). The method permits a quantitative 
interpretation of the results, which is of paramount importance for E/T imag- 
ing. SEA model is not stationary, i.e. it does not intrinsically assume that the 
scatter spectrum has the same shape in every pixel. However, estimating at least 
4 spectra is needed to make the analysis non stationary. In our example, accu- 
rate results were obtained when assuming scatter stationarity (i.e. considering 3 
factors only). 

The challenging Tc-99m/I-123 phantom we considered showed that SEA 
outperformed the method using energy windows, which is the only alternative 
proposed so far for this couple of radioisotopes. 

Although we gave evidence that SEA could offer a solution to the cross-talk 
problem, further investigations involving other combinations of radioisotopes, in 
emission/emission or E/T studies should now be conducted. So far, only planar 
images have been processed, but SPECT data can be dealt with similarly using a 
single SEA of the spectra corresponding to all projections, before reconstruction. 
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Abstract. We present here a new method for cerebral activation de- 
tection. This method is performed on individual activation maps of any 
sort and aims at performing a multi-subject group analysis while preserv- 
ing individual information and overcoming problems induced by spatial 
normalisation. The analysis is made through a multi-scale object-based 
description of individual maps. It is these structural descriptions which 
are compared, rather than the images themselves. The comparison is 
made through a graph, on which a labelling process is performed. The 
label field on the graph is modelled by a Markov random field, which 
allows us to introduce high-level rules of data interrogation. 



1 Introduction 

Understanding the neural substratum of human brain function is a growing field 
of research. Due to the very noisy nature of functional images, brain activation 
detection has essentially been approached so far in terms of statistical analysis 
m using a common anatomical reference. Although they have been validated 
in a wide range of applications, these analyses lead to some problems in terms of 
localisation and/or detection with regard to anatomy. In particular, the spatial 
normalisation performed to compare images from different subjects matches nor- 
mally only gross features. Moreover, anatomical information is poorly handled, 
and after a statistical analysis, it is generally difficult to estimate from the group 
result the areas activated in individual subjects. This knowledge should help the 
study of inter-subject functional and anatomical variability and would improve 
localisation with regard to anatomy. We propose here a new method based on a 
description of individual activation maps in terms of structure. This is followed 
by the comparison of these descriptions across subjects, rather than compar- 
ing directly the images at a voxel level in a stereotactic space. The method is 
designed to overcome, as far as possible, the problems induced by spatial nor- 
malisation [Zj. After detection over a group of subjects, the method allows an 
easy way to get back to the individual structures, and more generally permits 
high level interrogation, and in the future more informed analysis, of functional 
data sets. 
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2 Methods 

The method presented here is applicable to any kind of individual “activation 
map” : e.g. PET or fMRI difference images, t-maps, etc. It is divided into the three 
following steps. First, each individual map involved in the study is described by 
its scale-space primal sketch. Second, a graph is built that matches all the primal 
sketches. Finally, a labelling process is performed on the graph, which aims to 
identify the objects representing functional activations and those representing 
noise. 



2.1 The Scale-Space Primal Sketch 

The scale-space primal sketch is a representation, based on well-known prop- 
erties of linear scale-space allowing the description of the 1st order structure 
of an image P). We present its structure very briefly. For more precise details, 
we invite the reader to refer to 0, or to jS] for the particular 3-dimensional 
case applied to activation maps. This hierarchical multiscale description makes 
explicit the behavior of objects {grey-level blobs) through the scales of a linear 
scale-space. It is composed of multiscale objects {scale-space blobs) linked by bi- 
furcations representing their relative behavior, as illustrated in a symbolic way 
in Fig. n Measurements are assigned to the scale-space blobs to characterize 
their geometrical features and lifetime along the scale axis. 




Fig. 1. A slice of different scale levels of an activation map, the corresponding blobs, 
and a symbolic representation of the scale-space primal sketch 



2.2 The Comparison Graph 

We want to create a comparison graph such that it contains the primal sketches 
of all the subjects involved in the analysis, and such that it makes explicit all 
potential repetitions of an object across subjects, while being exhaustive, but 
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without taking any decision about their validity. To compare different primal 
sketches we normalise them with usual procedures |^. A longer term aim is to 
build the spatial referential using subjects individual anatomy and a high-level 
description of this anatomy, in terms of landmarks and identified structures CHI. 
The comparison graph should be a convenient framework for this purpose. The 
first criterion to link two blobs belonging to two different primal sketches is the 
overlap of their spatial and scale support. If it is fulfilled, we create a direct link 
[6i — b2]outi between the two scale-space blobs bi and 62 (Fig. EJ- Since we want 




direct (out!) 



induced 



Fig. 2. A direct link induces additional links at finer scales 



to introduce some flexibility in the position of activations, to overcome potential 
normalisation problems, we have to allow close blobs to be linked even if they 
have no spacial overlap. We therefore use the fact that a direct link might not 
represent exactly an activation but may suggest the presence of an activation 
at a finer scale. We then define induced links {out 2 links) as follows. If bi and 
62 are two blobs having no direct link between them, they have an induced link 
[61 — b2]out2 if they are “under” (in their primal sketches) two blobs ci and C2 
having a direct link [ci— C2]outi, and if they have a scale overlap (Fig. I2I). Allowing 
blobs without spatial overlap to be linked is a key feature of the process, since 
it provides greater flexibility for overcoming spatial normalisation limitations. 



2.3 The Detection Model: Use of a Markovian Random Field 

Activation detection is performed using a labelling process that uses the inter- 
subject comparison graph previously described. Our aim is to associate a positive 
label to each activation in the graph, and a null label to the structures of non- 
interest. An activation (i.e. a positive label) is associated to a spatial localisation, 
and can therefore have one occurrence in each of the individual primal sketches. 
The basic model we use to perform the detection is the following: 

1. a blob representing an activation is likely to have high measurements; 

2. two blobs representing the same activation must be linked in the graph and 
have the same positive label; 

3. two blobs representing the same activation are likely to have spatial supports 
close to each other; 

4. an activation should have only zero or one occurrence per subject. 

Given the local aspect of the dependencies that are defined within the graph, 
we model the label field as a Markov random field and, through a classical 
maximum a posteriori process, the optimal labelling is done by minimising a 



Structural Group Analysis of Functional Maps 451 



Gibbs distribution related energy 0. For more details, the reader should refer 
to . The energy is defined with potential functions on cliques of different order 
in the graph. Each potential function models either one or two rules in the 
above-specified model. Rule (1.) is modelled using a potential function defined 
over 1®* order cliques (blobs). When a blob has a positive label, the higher the 
associated measurements are, the lower the potential is, following a piecewise 
linear function. Rule (4.) is modelled using a function Vps, defined on each primal 
sketch, and linearly increasing with the number of occurences of each positive 
label in the primal sketch. Rules (2.) and (3.) are modelled by an inter primal 
sketch 2”^^ order clique potential function. On such a clique, when the two blobs 
have the same positive label, the associated potential is a function that decreases 
with a measurement of similarity between the two blobs. This similarity function 
is the second way, in the whole process, to overcome problems induced by spatial 
normalisation. At the moment, we use an overlapping function for outl links, 
and an Euclidean distance function for out2 links. The aim, in the long term, is 
to have an individual anatomy-related similarity function which somehow would 
provide an improvement to spatial normalisation. 

The total energy function is then minimised using a stochastic algorithm, the 
Gibbs sampler with annealing |B|, which is shown to provide good convergence. 
After minimisation, the process produces a set of positive labels, each one repre- 
senting an activation and having an occurrence in a number of primal sketches. 
We therefore know the occurrence, or the absence of occurrence, of each acti- 
vation for any subject. This occurrence can then be mapped on the individual 
anatomy of the subject for localisation considerations. 

3 Results 

The process presented here has been tested on a PET motor protocol, including 
10 subjects and 12 images per subject. For each subject, an individual statisti- 
cal f-map was first computed using the SPM software , contrasting a periodic 
auditory-cued right hand movement and a rest condition. A primal sketch was 
then built from each of the individual maps, and the 10 primal sketches were 
compared using the labelling process. A group analysis was also performed using 
SPM software , and used as a reference to validate our results. Numerous acti- 
vations were found at a very significant level in the group map in the expected 
brain regions. 

After our labelling process, several observations arise. 

- All high-significance expected activations were detected, although given the 
functional variability it is difficult to compare a pure group analysis with a 
method that considers individual information. 

- Two false positive were detected, but they were both outside the brain and 
caused by border effects (easily eliminated). 

- A classical threshold on the individual maps yielded poor results for every map 
being, either too selective, or too noisy. This shows a crucial advantage of our 
method; the detection is processed for each subject taking into account not only 
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the intensity in the map but also the knowledge of other subjects maps. 

- The relevance of each detected positive label was correlated with its associated 
local energy. This was further confirmed by simulations. In particular, when a 
positive label corresponds to no real activation, its local energy is high enough 
to discriminate it from labels associated with real activations. 

- Localisation accuracy is lowered by the lack of an automatic scale selection 
method to represent each detected scale-space blob. Furthermore, experiments 




Fig. 3. individual mapping of the primary motor activation on 3D rendering of subject 
anatomy 



with simulated activation maps including two different objects showed a detec- 
tion rate of 100% (no false negative) for a localisation variability up to twice the 
size of the objects, which shows a good resistance to inter-subject variability. 
Simulations on a very large number of noise images are currently being run to 
assess a precise evaluation of the error rate. 

4 Conclusion 

We have presented here a new method to analyse brain functional images that 
considers functional activation detection at a structural level, and permits a way 
of getting back to individual results after detection over a group of subjects. It 
uses the power and comprehensiveness of multiscale methods to describe image 
structure by looking at their whole scale-space without any a priori information 
concerning scales of interest and without any “coarse-to-fine” strategy. A major 
difference from classical methods is the comparison at an object level, which 
permits us to introduce higher level criteria for the analysis and is a way to 
overcome inter-subject variability effects. The process has proved to be able to 
detect efficiently expected activations with a PET dataset. It is promising for 
functional MRI studies, since fMRI provides more reliable individual maps than 
PET. Further research still has to be undertaken to solve outstanding questions, 
particularly concerning the choice of the optimal scale used to represent (as op- 
posed to detect) a scale-space blob, since the extent of the reported activations 
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depends on this scale. Secondly, a precise evaluation of the data-driven potential 
functions still has to be properly investigated from the distribution of measure- 
ments associated with the blobs. We showed that the spurious effects of spatial 
normalisation could be reduced by means of the comparison graph and of an ap- 
propriate definition of similarity between blobs from different subjects. Although 
it is difficult to relate the proposed analysis to standard statistical analyses, it 
is worth noting that there is some analogy with analyses using random effect 
linear models. Specifically, activation detection is performed using a subject by 
subject variability rather than on a scan by scan variability. Finally, we would 
like to point out the fact that using a Markovian model for the detection allows 
the user of such a system to interrogate the data in ways that can be designed 
according to the experimental question. It is very easy to define new potential 
functions in which one can introduce, for instance, a priori information about 
a precise expected location, or about the search for a network of activations 
instead of isolated ones. Thus, the system can explore multi-subject functional 
data sets in a higher level manner than has been achieved so far. 
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Abstract. This paper addresses the problem of accurately mapping 
echo planar image (EPI) data acquired for functional MRI studies to 
conventional Ti weighted anatomical MRI. In particular here we exam- 
ine the correction of spin echo image distortion resulting from magnetic 
held inhomogeneity. To do this we must account for both geometric and 
intensity distortions within the EPI data. This approach combines ideas 
on multi-modality registration criteria, non-rigid registration and models 
of geometric and intensity distortion in MR image formation. Specihcally 
the relationship between the geometric and intensity distortion in spin 
echo EPI imaging is used to constrain the geometric correction estimate 
and replaces the arbitrary smoothing energy term in non-rigid registra- 
tion. 



1 Introduction 

The interpretation of functional magnetic resonance images (FMRI) is heav- 
ily dependent on their precise anatomical location. It is common for functional 
imaging studies to include an additional conventional T\ acquisition to provide 
anatomical context. Current multi-modality registration methods enable many 
types of functional image data to be accurately aligned with anatomical data 
fT^ . These methods generally account for differences in patient positioning 
and imaged field with a global rigid or affine geometric transformation. In prac- 
tice echo planar image (EPI) data used in functional imaging can exhibit se- 
vere localised geometric distortion. This is particularly apparent in acquisitions 
through the brain where bone or air boundaries with soft tissues result in signifi- 
cant magnetic field inhomogeneities. Errors such as these can lead to mis-placing 
of functional signals by many millimeters, resulting in the possible displacement 
of a response into a neighbouring gyri j^j. 

Current approaches to accurate mapping of this data to anatomical MRI in- 
volve a correction of these EPI artifacts using field mapping acquisitions mm 
prior to rigid alignment with anatomical MRI. This requires considerable addi- 
tional imaging time and may introduce errors arising for example from flow 
effects jZ]. In this paper we propose the use of a direct non-rigid registration 
but employing geometric constraints derived from a model of spin echo imaging 
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distortion. The key step here is to use the link between geometric errors and the 
resulting localised intensity distortion in spin echo EPI. This allows us to define 
a direct global intensity criteria expressing the quality of the geometric match 
between EPI and anatomical MRI without the need for additional smoothing 
terms on the estimated warp. 



2 Distortion in Spin Echo Anatomical and EPI Imaging 



During the spin echo imaging process magnetic field gradients are imposed on the 
patient tissues. Any variation from the assumed linear gradient results in phase 
or frequency shifts in the recorded k-space signal. For conventional anatomical 
Ti spin echo imaging there is a displacement error due to local magnetic field 
inhomogeneity ABo{x,y, z) in the phase (y), frequency (a;) and slice encode {z) 
directions. Briefly, from |Zj and by substitution (see (H), these displacements 
from (x,y,z) to {xA,yA,ZA) are described by, 



XA = X 



ABp{x,y,z) 

GxA 



( 1 ) 



VA = y, 



and 



= z + 



ABp{x,y,z) 
G,a ^ 



( 2 ) 

(3) 



where GxA, Gy a and GzA are the imaging gradients imposed in the respective 
axes. The resulting y (phase encode) axis has no distortion, while the x (fre- 
quency encode) and z (slice encode) have magnetic field related displacement 
errors. For typical imaging sequences these may result in pixel shifts of only 
0.1mm. 

For spin echo EPI functional imaging the displacements take a slightly dif- 
ferent the form 0, 



ABo{x,y,z) 

Xf = X-\ ;;; 

^xF 



(4) 



and 



VF = y + 



ABo{x,y,z){2Tr, 

GypXrai 



NT) 



ZF 



ABo{x,y,z) 
pt 



(5) 

( 6 ) 



Now, in the phase encode y axis, the resulting displacement is scaled by a factor 
(2Tramp+NT) j Tramp Compared to the other axes. The factor N from the imaging 
matrix results in a significant displacement, and for typical imaging parameters 
0 we have the possibility of shifts of one or more pixels. 
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2.1 Intensity Distortion 

Considering the general case of 3D spin echo imaging, the change in coordinate 
system from true point locations {x,y,z) to displaced points at {xp,yF, zp) 
result in a change in signal strength governed by the Jacobian of the distor- 
tion transformation P^. So, if we have an estimate of the geometric correc- 
tion mapping between the correct anatomical space to the distorted EPI space: 
Te : (x, y, z) 1 -^- {xp, yp, zp), the estimate of the corresponding corrected EPI im- 
age intensity value fe{x, y, z) from the distorted measured values fm{xp, yp, zp) 
is given by 



fe{x,y,z) = fm{Te{x,y,z))/J{x,y,z). (7) 

This relationship is the key to introducing geometrical constraints into the in- 
tensity based correction criteria. Effectively it can be seen as a form of signal 
conservation in the distortion process. As the image is compressed locally, signal 
from many voxels is mapped to fewer voxels. Conversely, where the image is 
expanded, signal from one voxel is mapped to many voxels. 



2.2 Relative Distortion 

If we assume that the rigid body rotations between the axes of the two scans 
will be small (say less than 5 degrees). Then the phase, frequency encode and 
slice select directions in the EPI and anatomical MR are closely aligned. If the 
frequency and slice select gradients are similar {G^p ~ G^a and Gzp ~ G^a) 
then the resulting distortions will be small so that xp ~ xa and zp « za- 
This leaves displacement in the phase encode y axis. In a conventional spin echo 
scan, the displacement due to the field inhomogeneity (0 is negligible. In EPI 
imaging, the acquisition of multiple phase encode steps, with a single excitation 
pulse, results in significant displacements along the phase encode axis. 



3 Correction Criteria Using Signal Conservation Model 

Entropy based multi-modality registration criteria HUS! provide a powerful ap- 
proach to spatially aligning one image with another where there is some spatial 
correspondence of structure delineated by the two images. The key problem with 
non-rigid registration is the need to introduce constraints on the smoothness of 
the geometric transformation. This prevents unconstrained motion in regions 
of the image where there are no corresponding structures, particularly in the 
multi-modality case. The nature of the smoothing approach depends on the ap- 
plication gq. A common approach to smoothing the geometric transformation 
is to include an energy term, such as Tikhonov regularlisation H2| which leads 
to the optimisation of a cost function which is a combination of the intensity 
similarity and the smoothness energy. 
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The basic idea behind our correction approach is to use a global similarity 
measure between the EPI image fe(T(x,y, z)) with estimated correction trans- 
formation Te{x,y,z) and a conventional anatomical acquisition g{x,y,z), as a 
measure of geometric and intensity correction. 

Using the Jacobian term to modify the EPI intensities will result in bright 
regions of the image being more sensitive to local changes in the transformation 
estimate than darker regions. To avoid this we take logarithms of equation o 
and derive a correction criteria between Log corrected EPI /' = ln(/) image 
values, 



feix,y,z) = f^{Te{x,y,z)) - J'^{x,y,z), 



( 8 ) 



and the original anatomical MR values g{). We maximise an entropy based 
registration criteria cm derived from mutual information dm which provides 
a form of image overlap invariance, 



arg max 

Te 



H{g) + H{Ufe)) \ 
H{9,Te{fD) J 



(9) 



The terms ff(g) and are the marginal entropies of values in the anatomical 

and EPI images respectively and H{g, f^) is the joint entropy. All entropies are 
evaluated from values in the overlap of the two modalities. 



4 Experimental Registration Results 

The transformation estimate between the two images is controlled by a combina- 
tion of the local warp to account for distortion and the six rigid transformation 
parameters determining patient positioning, 

(^j y ; '^patient ~\~ ^ABq V : z') . (t9) 

In EPI field mapping techniques the measured displacement field is commonly 
approximated by a low order polynomial (eg I7I81 1 . Here we use a cubic B- 
spline to parameterise the local warp over the image volume. Registration was 
initiated by first running a rigid registration to form a good starting estimate. 
From this estimate, a two step process was applied iteratively, consisting of a 
simple gradient ascent with respect to the local B-spline parameters followed 
by a global rigid re-registration. Spline grid points with an isotropic spacing 
of 15mm were used. Tri-linear interpolation was used to estimate intermediate 
values in the EPI image for voxel locations in the anatomical spin echo image. 
A discrete histogram of 64 x 64 intensity bins was used to estimate the marginal 
and joint entropies in (|2|). 

The registration algorithm was applied to correcting spin echo EPI to MRI 
for a volunteer image set. Examples of the quality of the correction are provided 
by the coronal slices in Fig.Dl Here the downward displacement of the temporal 
lobes is recovered while in the medial portion of the brain, displacement in the 
opposite direction in the same slice is also recovered. 
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Fig. 1. Example coronal slices through the spin echo MR volume toward the front of 
the patient with an iso-intensity contour from spin echo EPI overlayed to illustrate 
spatial alignment. Initial global rigid registration estimate (left) and estimate with 
non-rigid registration (right) 

A second set of anatomical and functional images were acquired and for 
these a set of t-maps indicating activation was then calculated. The alignment 
algorithm was applied to correct one frame of spin echo EPI from this sequence 
onto to a high resolution anatomical scan. This transformation was then used 
to map the t-map data back to the anatomical reference. Figure |2| illustrates the 
effect of the local geometric correction on the location of t-map activations. 




Fig. 2. Example correction on a functional imaging study. Coronal slices through the 
spin echo Ti MR anatomical image volume with t-map activation displayed as contour. 
Global rigid only registration (left) and local warp with signal conservation term (right) 
showing displacement of activation into gray matter 



5 Discussion 

In this paper we have begun to address the general problem of precisely aligning 
functional MRI scans with conventional anatomical acquisitions. We have con- 
centrated here on spin echo imaging acquisitions. We have used knowledge of 
the image formation and distortion processes in the two MRI scans to impose 
constraints on the correction estimate. In particular we have modified a common 
entropy based alignment criteria using knowledge of signal conservation to en- 
force geometric constraints on the correction warp. It is interesting to note that 
the causes of distortions are commonly related to material boundaries within the 
patient (for example soft-tissue boundaries with bone in the orbits and around 
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the petrous bone 0) which are themselves visible within the MRI acquisitions. 
These boundaries therefore inherently provide local image constraints on the 
alignment close to where distortions are occurring. 

Overall the initial results with this approach indicate that combining MR dis- 
tortion models with multi-modality registration techniques can produce precise 
mapping of functional information to anatomical images, and provide a viable 
alternative to field mapping techniques. 
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Abstract. Point-based registration is performed by matching a set of 
homologous points in two spaces. It is common to use such techniques as 
an aid to navigation during neurosurgical procedures. For many years, 
statistics concerning Target Registration Error (TRE) have been studied 
qualitatively using numerical simulations. We present here an expression 
that gives a good approximation to the distribution of TRE for any given 
target and configuration of fiducial points. 



1 Introduction 



The point-based registration problem is as follows: given a set of homologous 
points in two spaces, find a transformation that brings the points into approxi- 
mate alignment. In many cases the appropriate transformations are rigid, consist- 
ing of translations and rotations. Medical applications abound in neurosurgery, 
for example, where the head can be treated as a rigid body |3I I I Ibl I bl 1 1 ll22lTT)j . 
The points, which we will call fiducial points, may be anatomical landmarks or 
may be produced artificially by means of attached markers. In the case that 
we address here, the spaces are three dimensional and may consist, for exam- 
ple, of two MR volumes, a CT volume and an MR volume or PET volume, or, 
in the case of image-guided neurosurgical applications, an image volume and 
the physical space of the operating room itself. The rigid-body, point-based im- 
age registration problem is typically defined to be the problem of finding the 
translation vector and rotation matrix that produces the least-squares fit of the 
corresponding fiducial points. The appropriate translation vector is simply the 
mean displacement between the two point sets. The problem of determining 
the rotation matrix can be easily reduced to the “Orthogonal Procrustes prob- 
lem” IHESI. Peter Schonemann published the first solution to that problem in 
1966 1201 . His solution was rediscovered independently in 1983 by Golub and van 
Loan 0 and again in 1987 by Arun et al. 0. These latter solutions, unlike the 
former, employ the method of Singular Value Decomposition (SVD), but they 
can easily be shown to be equivalent to Schonemann’s solution m- 

The solution is unique, but can be expected to yield an imperfect registration 
in the presence of errors in locating the points. Maurer et al. nn suggested 
three useful measures of error for analyzing the accuracy of point-based regis- 
tration methods. 
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1. Fiducial localization error (FLE), which is the error in locating the fiducial 
points. 

2. Fiducial registration error (FRE), which is the root-mean-square distance 
between corresponding fiducial points after registration. 

3. Target registration error (TRE), which is the distance between corresponding 
points other than the fiducial points after registration. 

The term “target” is used to suggest that the points are directly associated 
with the reason for the registration. In medical applications they are typically 
points within, or on the boundary of, lesions to be resected during surgery or 
regions of functional activity to be examined for diagnostic purposes. 

Much work has been done ^SI14llllbllHIIbir7E| using numerical simulations 
to investigate the properties of FRE and TRE. Unknown to many of those per- 
forming these simulations, Sibson izg gave in 1979 an approximation to the 
distribution of FRE. In 1998 Fitzpatrick et al. derived an equation which allows 
calculation of an approximation to the root mean square value of TRE HH, 
and agrees with the published simulations. In what follows, however, we give for 
the first time an approximation to the distribution of TRE, rather than just its 
expected value. 

2 The Model 

We make a simplifying assumption in this work: that the fiducial localization 
error in one space is identically zero. This assumption does not generally hold 
in real registration problems, but the derivation may easily be extended to the 
case in which FLE is nonzero in both spaces. 

Let N be the number of fiducials and K be the spatial dimension. In general, 
we may write X as the N-hy-K matrix whose rows correspond to the position 
vectors of the fiducial points in one space, and Y as the N-hy-K matrix rep- 
resenting the fiducials in the other space. The registration problem is to find a 
K-hy-K orthogonal matrix, R, and a 1-hy-K translation vector, t, so that the 
points XiR 1 are in optimal alignment with the corresponding points yi in Y. 
By “optimal alignment”, we mean that rms(FRE) is minimized, i.e., R and t 
are chosen to minimize 

tr{(Y - XR-t)\Y - XR-t)). (1) 

In this work, we assume that X is related to U by a rigid-body transformation 
representing a re-orientation of the rigid body to which the points are attached, 
and a N-hy-K matrix F of perturbations representing the fiducial localization 
error. We assume that the elements of F are independent, zero- mean normal 
variables with equal variance, i.e., that the FLE has the same distribution at 
each fiducial point and in each of the coordinate directions at every point. This 
assumption allows the use of a closed-form solution for the registration problem 
itself, and as pointed out by Sibson EH. permits us to neglect the rigid body 
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transformation relating X and Y, as FRE and TRE are independent of this re- 
orientation. We note that, under these assumptions, the variance of each element 
of F is equal to (FLE^) jK. 

We thus simplify the problem to that of registering X to Y = X + F. As the 
choice of origin for X is arbitrary, we choose the centroid of X to be the origin. 



3 The Distribution for the Case K = 3 



Choosing coordinate axes coincident with the principal axes of the fiducial con- 
figuration, and with the fiducials’ centroid as the origin, it may be shown 130 
that 

TRE^(a:, y, z) - \ kiXi + ^ 2 X 1 + hXi), (2) 

where 

^ ^ y‘^){A\^ F A\^) + + ^ 2 ^) + Af, + 

y^(x^Fy^Fz^) x^(x^Fy^Fz^) \ 

j^^tTM^+W^W+TxW+WJ’ 

k3 = l/N. 

This equation provides an approximation to the true distribution of TRE^. By 
taking the expected value of Eq. |21 noting that the expected value of each Xi 
variable is 1, we have that 





{TRF\x, y,z)) 



(FLE^) 

3 



(fci -1-^2 + ^ 3 ) 



(FLE^) 




rj \ 

(3) 



This is in agreement with the expression derived in m, and, as shown there, 
exhibits the 1/VN dependence observed by Hill Evans 0, and Maurer ini, 
and the ellipsoidal spatial dependence observed recently by Maurer m and by 
Darabi 



4 Comparison with Simulations 

As we do not have access to very large numbers (“large” being of the order of 
tens of thousands) of patient datasets, we must rely on numerical simulations 
to check the correctness of the result given in Eq. 0 We chose four values of N 
for which to perform the comparison: TV = 3, 4, 10, 20. We used the same model 

^ This technical report is available on the World Wide Web as 

http:/ /cswww. vuse.vanderbilt.edu/~jayw/tre_dist.ps or tre_dist.pdf 
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to simulate FLE and thus generate values of TRE as in our previous work HH, 
generating fiducials within a cube of side 200 mm, targets within a cube of 
side 400 mm, and FLE in each direction as a normal variable whose variance 
was 3.33 mm. Using this fairly large value of FLE allows us to be conservative 
with the statements we make concerning the quality of the approximation to 
TRE^, as the difference between the TRE^ distribution given in 0 and the true 
distribution will tend to grow with increasing FLE. 

For each perturbation and registration iteration in the simulation, we output 
a value of TRE. We generated an equal number of TRE values using Eq. El with 
a random number generator PEI employed to produce samples of the chi-squared 
variables. For each value of N, we produced 1,000,000 simulated TRE values and 
the same number of values based on Eq. 0 which we will call “generated” values. 
We compared the two distributions using the Kolmogorov-Smirnov test uni- For 
the cases N = 3 and 4, the K-S test showed a significant difference {p < 0.01) 
between the distributions. For N = 10 and 20, the test showed that the difference 
between the distributions was not significant (p < 0.05). 

To explore the differences between the true and approximate distribution, 
we next performed ten runs each of 1,000,000 iterations for the simulator and 
generator. In the tables that follow, we show the percentage difference, 
100(generated — simulated) /simulated, in rms value, median, and 95*^ percentile 
values between the simulated and generated value for each value of N . For all 
the tabulated values, the difference between the simulated and generated value 
was significant (two-tailed t-test, P < 0.01). 



Table 1. Simulated vs Generated rms TRE values (mm) 



N 


Simulated (± sd) 


Generated (± sd) 


% difference 


3 


6.8701 (0.0026) 


6.8681 (0.0037) 


-0.0291 


4 


4.6866 (0.0026) 


4.6845 (0.0019) 


-0.0448 


10 


1.8552 (0.0009) 


1.8547 (0.0008) 


-0.0270 


20 


1.3812 (0.0008) 


1.3809 (0.0007) 


-0.0217 



Table 2. Simulated vs Generated median TRE values (mm) 



N 


Simulated (± sd) 


Generated (± sd) 


% difference 


3 


5.7220 (0.0030) 


5.7358 (0.0028) 


0.2412 


4 


3.6902 (0.0031) 


3.7195 (0.0023) 


0.7940 


10 


1.5834 (0.0006) 


1.5845 (0.0010) 


0.0695 


20 


1.1763 (0.0009) 


1.1780 (0.0011) 


0.1445 
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Table 3. Simulated vs Generated 95*^ percentile TRE values (mm) 



N 


Simulated (± sd) 


Generated (± sd) 


% difference 


3 


11.8465 (0.0069) 


11.8167 (0.0116) 


-0.0298 


4 


8.4374 (0.0035) 


8.3835 (0.0072) 


-0.0539 


10 


3.1189 (0.0033) 


3.1163 (0.0013) 


-0.0036 


20 


2.3270 (0.0008) 


2.3236 (0.0014) 


-0.0034 



5 Discussion 

We can see from TablesQ,|3 and l^that the distribution given in Eq. 0is a good 
approximation to the actual distribution of TRE, for the fiducial configurations 
and targets which we used. The generated values do not match the simulated 
ones exactly: they tend to overestimate the median and underestimate the 95*^ 
percentile. However, we note that for our configurations, a conservative estimate 
of TRE may be generated by simply increasing the generated value by 1% at 
the 95*^ percentile: in all cases, this gives a value which is above the 99% upper 
confidence bound of the mean simulated value. This shows that the results given 
by Eq. El are close enough to the exact values to be of use to those who wish to 
gain a conservative, but fairly accurate, estimate of TRE in clinical practice. 

6 Conclusion 

We have derived an approximation to the distribution of TRE, and proved via 
numerical simulations that the result is close enough to the exact one to be of 
use for clinical estimation of expected values and confidence intervals for TRE. 
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Abstract. This paper describes a fast Mutual Information (MI) method 
for registering volumetric medical images. The new technique originates 
from the method designed by Viola f[] wherein registration is achieved 
by iteratively adjusting the relative position and orientation until the 
MI between two volumetric images is maximized. In this iterative pro- 
cess if n number of samples are used then there are O(n^) exponential 
calculations per iteration. The method proposed in this paper reduces 
the number of exponential computations by using an index table for 
estimating the Gaussian density functions (GDF). The index table is 
optimally pre-computed using automatic segmentation based on zero- 
crossing of wavelet transform. Thus a majority of exponential computa- 
tions is reduced to index-intensity comparisons. The table lookup process 
is speeded up using a search mechanism based on probability priority. 
The proposed method has been successfully used to register both nor- 
mal and pathological MRI and GT datasets. Experimental results show 
that this approach yields identical results in a fraction of time taken by 
the original method. The speedup increases with the number of sam- 
ples used. For example, with 50 samples the speedup is 2.73 and for 100 
samples it increases to 5.5. 



1 Introduction 

A variety of volume registration methods is described in jOj. Most of them either 
involve user-based homologus feature selection or tedious preprocessing such as 
segmentation of surfaces or tissue layers. Over the last few years, approaches 
based on “similarity metrics” have begun to appear and MI is one of them. MI 
methods assume little about the functional relationship between the intensities 
of the two images and do not require any segmentation. Hence they are popular 
and useful. MI methods have been used to solve different types of registration 
problems in I3ZIBI 

MI is expressed as an expectation of the negative logarithm of the probability 
density. In 0 the joint and marginal distributions are estimated by normaliz- 
ing the joint and marginal histograms of the overlapping parts of both images. 
Calculation of histograms in each iteration is prohibitively expensive. In the 
Parzen window method is used on a set of samples drawn from the overlapping 
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parts of the two images to estimate the image intensity distribution. This ap- 
proach did improve the speed performance. However, if n number of samples are 
used then there are 0{n‘^) exponential calculations per iteration for the estima- 
tion of GDF. This paper presents an approach which speeds up the method of 
P using the following strategies. 

— Estimation of GDF is simplified using a lookup table which is built in the 
pre-processing stage. Thus a large number of exponential computations are 
reduced to mere index-intensity comparisons. 

— To optimize the construction of lookup table and hence pre-processing time, 
the GDF is computed and stored for only a few relevant intensities. The 
relevant intensities are identified using segmentation based on an automatic 
thresholding method P that uses the zero-crossing of the wavelet transform 
[3|. MI methods do not require any pre-segmentation. However, the thresh- 
olding method used here is low-cost, automatic and does not require a priori 
information or expert guidance. 

— To speed up the table retrieval, a search scheme based on probability priority 
is used. 

The remaining sections are organized as follows. In Sect. 2, an overview of 
the method is given. Section 3 addresses the proposed method. Section 4 has 
implementation details and Sect. 5 provides the conclusion. 



2 MI Method 



Given a reference volume with intensities u{p) and a test volume with intensities 
v{p) , Mutual information I is defined in terms of entropy and is a function of 
the transformation T: 



I{u{p, v{T{p)) = h{u{p)) + h{v{T{p))) - h{u{p),v{T{p)), 



where h(.) is the entropy of a random variable. We try to seek a transformation 
T that maximizes the mutual information between these two volumetric images. 
In order to seek a maximum of the MI, an approximation to its derivative can 
be given as follows: 



dj_ „ 

dr 



T,s,eB T,s,eAi'^^ ~ [Wv{Vi, Vj)-lp- 



- Vj), 



where A and B are the two sample sets Nb is the number of samples in B, 
Ui = u{Pi),Vi = v{T{Pi)), and Wi = [ui,Vi]'^. t is the parameter (rotation 
vector and translation components) of transformation. When the optimization 
is involved, we prefer to represent the rotation vector with fewer optimization 
parameters in the absence of any constraints. The derivatives of intensities with 
respect to rotation vector can be inferred from its antisymmetric matrix operator 
m- The weighting factors are defined as: 



j ) 



— ^3 ) 
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G is the GDF with the covariance ip. When s is a vector, ip is the covariance 
matrix (assumed to be diagonal). A stochastic gradient descent scheme is used to 
optimize the parameters of the transformation. The registration is performed in 
a coarse-to-fine manner on a hierarchy of data volumes that had been generated 
by wavelet decomposition. 

3 Speedup of MI method 

In the process of searching for the transformation by the stochastic gradient de- 
scent, a large number of exponential calculations are involved at each iteration. 
For example, even for relatively small sample sizes {Na = Nb = 50), there are 
at least 5000 exponential computations in each iteration. The gradient of the MI 
has to be updated at each iteration and hence is computationally intensive. This 
can be circumvented by creating an index table before the iterative process. The 
index table stores the values of the GDFs. However, if the exponential compu- 
tations for all possible intensity values are performed, it requires considerable 
pre-processing time. A possible solution to this bottleneck is to store the GDFs 
of only a few relevant intensities. This could be done by mapping all the intensi- 
ties onto a smaller range. Such ad-hoc mapping would result in inaccuracies and 
information loss. We employ an automatic approach based on zero-crossing of 
the wavelet transform |0] for selecting the thresholds. Thresholds are located to 
the left/right of the positive/negative crossover of zero-crossing in a convolved 
histogram. The representative relevant intensities is chosen as the maximum 
point between the positive and negative crossovers. To ensure the validity of the 
thresholds and relevant intensities across multi-levels, a coarse-to-fine adjust- 
ment of the thresholds which takes advantage of multi-scale information is given 
by a minimum distance criterion. The index table stores the result of the GDFs 
for these relevant intensities only. In majority of cases, this table can be used 
directly to get the GDF. In other words, a large number of exponent components 
are reduced to mere comparisons during the table retrieval process. 

Given a sample intensity ut or Vi, it is also important to efficiently retrieve 
the corresponding GDF value from the index table. Generally it is impossible, on 
the average, to complete the search of n items in fewer than Ig n comparisons by 
binary search. From the histogram analysis of the intensities, it is apparent that 
one or more samples occur more often than the others. This nature of distribution 
can be exploited by using a search method that compares the sample with items 
based on the priority of the item’s probability of occurence. This probability 
priority search method is preferable since it locates a given item quickly. During 
the segmentation in the pre-processing phase, the probabilities of each item can 
be approximated by the area under each segment curve in the histogram. 

4 Implementation and Results 

The proposed method was implemented on an SGI/ 02 workstation. We inves- 
tigated the performance of our registration scheme by aligning 3D MR with GT 
images. For the results shown in Table Q and Table |21 MR data served as the 
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Table 1. Comparison of resnlts between original and proposed scheme 



scheme 


Rotn vector (rad. measure) 


Transl. vector (mm) 


rx 


ry 


rz 


tx 


ty 


tz 


origial 

proposed 


-0.0652 

-0.0647 


0.0321 

0.0326 


-0.0118 

-0.0122 


5.873 

5.924 


-0.325 

-0.317 


120.82 

120.01 



Table 2. Timing comparison for two schemes 



scheme 


Pre pro time 
(sec) 


samples No. 
( Na = Nb) 


Time(sec) 
per item. 


Total Time 
for 6000 item 


original 


0.985 


50 


0.0172 


104.18 


80 


0.0487 


293.18 


100 


0.0746 


448.58 


proposed 
15x15 table 


1.011 


50 


0.0062 


38.21 


80 


0.0103 


62.81 


100 


0.0134 


81.41 


proposed 
256x256 table 


1.328 


50 


0.0063 


39.12 


80 


0.0105 


64.32 


100 


0.0137 


83.53 



reference and CT as the test data. The automatic threshold scheme partitioned 
the intensity interval [0,4096] into 15 segments. The pre-processing produced a 
15x 15 table. Table [Dprovides a comparison between the two schemes. The results 
of the proposed scheme are basically identical to that of the original method. 
Table El compares the average computation time of the two methods, which 
includes the pre-processing time and time for each iteration, on average. Fig. 0 
shows the time taken by the two methods for different number of samples. It 
can be seen that the speedup of the proposed approach increases with increasing 
number of samples. Fig. Elshows some examples of the final configuration of the 
MR-CT registration obtained using the proposed approach. 




Fig. 1. Graph of samples number vs time (iteration No. =6000) 
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CT: (256x256x87) 
(0.9:<8x().9:i8x:^.0) 
MKI; (256x256x111) 
(0.938x0.938x1.0) 



CT: (512x512x80) 
(0.469x0.469x1.0) 
MRI: (256x256x96) 
(0.898x0.898x2.0) 



CT: (512x512x75) 
(0.469x0.469x1.0) 
MRI: (256x256x96) 
(0.898x0.898x2.0) 



CT: (512x512x19) 
(0.469x0.469x5.0) 
MRI: (256x256x23) 
(0.898x0.898x6.5) 




origin*] CT 



originul MK 



Fig. 2. Qualitative results of the proposed MI algorithm. The normal dataset used in 
the first row corresponds to the one used in Table 1 and 2. Last three rows correspond 
to pathological cases 



5 Conclusion 

We have presented a fast MI method for registering multi-modal data. The 
proposed method provides identical results in a fraction of the time required by 
the original method. We are in the process of integrating this algorithm into the 
VIVIAN neurosurgery planning system and this would help us to clinically 
validate the results. 
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Abstract. We investigated 7 different similarity measures for rigid body 
registration of serial MR brain scans. To assess their accuracy we used a 
set of 33 clinical 3D serial MR images, manually segmented by a radiol- 
ogist to remove deformable extra-dural tissue, and also simulated brain 
model data. For each measure we determined the consistency of registra- 
tion transformations for both sets of segmented and unsegmented data. 
The difference images produced by registration with and without seg- 
mentation were visually inspected by two radiologists in a blinded study. 
We have shown that of the measures tested, those based on joint entropy 
produced the best consistency and seemed least sensitive to the presence 
of extra-dural tissue. For this data the difference in accuracy of these 
joint entropy measures, with or without brain segmentation, was within 
the threshold of visually detectable change in the difference images. 



1 Introduction 

In this paper, we report the results of a systematic comparison of seven similarity 
measures for serial MR registration. We assess the accuracy of the measures using 
simulated MR brain images 0, and quantify consistency using images from a 
clinical study [5|- We compare the performance of the measures on the clinical 
data with, and without segmentation of extra-dural tissue. We interpret these 
results in the context of a blinded visual assessment study. 

2 Methods 

Our clinical data is from five growth hormone deficient adults undergoing therapy 
and six normal subjects P|. Each subject was scanned 3 times at 3 monthly 
intervals. An additional normal subject was scanned twice on the same day, 
for assessing observer sensitivity to synthetic misregistration. All images were 
axial T1 weighted 3D spoiled gradient echo with 1x1x1.8mm voxels, including 
head and brain stem. A phantom was scanned to measure scaling errors [31/j . 
The clinical images were manually scalp segmented by a radiologist to eliminate 
deformable extra-dural tissue, using Analyze (Mayo Clinic, Rochester, MN, US). 
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Simulated MR Brain Image with Added Noise and Distortion Simu- 
lated data was based on the McGill full anatomical MR brain model image |2|. 
Two noiseless images were used, 1 with 40% RF inhomogeneity intensity distor- 
tion and 1 without. Noise in a modulus MR image is Rician distributed 0. To 
simulate Rician noise a numerical complex random variable was added to each 
voxel of the noiseless (real) image and then the modulus was taken to produce 
a magnitude image. The random variable was constructed from 2 Gaussian dis- 
tributed ones for the real and imaginary parts. The simulated Rician noise was 
parameterised by measuring the mean and standard deviation of intensities of 
an artefact free region of a clinical scan corresponding to air |Sj. 



Similarity Measures and Registration Algorithm The ideal similarity 
measure would have one optimum at the point of registration. Viola states that 
for images that differ only by Gaussian noise, the measure is optimal; with a 
linear intensity transformation the Pearson product moment measure is optimal, 
and where the intensity transformation is unknown joint entropy is likely to be 
optimal Two important properties of serial MR images that effect simi- 
larity measures are: intensity distortion (due to RF inhomogeneity and motion 
artefact) and deformation of extra-cranial tissue (approximately 20% of typi- 
cal brain scans). We have implemented 3 measures used by other researchers 
in serial MR: (1) mean squared difference in intensities (chi) |S|; (2) Pear- 
son product-moment cross correlation (ncc) 0; (3) ratio image uniformity (riu) 
ca. We have also implemented 4 measures proposed for other medical image 
matching applications: (4) mutual information (mi) |H|; (5) normalised mutual 
information (nmi) CD!; (6) entropy of the difference image (edi) jj; (7) pattern 
intensity, radius 1, a = 10, (pi) m- The measures can be put into two groups: 
(a) those based on entropy: mi, nmi, edi and (b) those based on correlation: 
chi, ncc, pi, riu. Our algorithm optimises the measures using a multi-resolution 
strategy similar to Studholme m- 



Consistency of Two Transformations For two rigid-body transformations 
Ti and T2 in homogeneous form, T2T1 is the result of first applying Ti then 
T2. Given two transformation estimates T^ and Tf,, mapping points, p{i), from 
image 1 to image 2, the difference between these transformations is the mean 
voxel displacement (dp) — ^ Svjg/o I I brain region Iq containing 

Nq voxels. The RMS analogue is: dprms = Y^(SviG/o I ^(P(*)) P)- 



Consistency of 3 Transformations For N images there are P{N, 2) = 
possible transformations. So for 3 images there are 6 different transformations 
between image pairs. If we consider 3 transformations T12, T23, T31 between 
image pairs (T12 transforms image 1 into image 2) then in the absence of error, 
T31T23T12 is the identity I. Registration solutions, inevitably, have some error 
so: T31T23T12 = I -h AT, i.e. AT = I — T31T23T12 is the error (internal in- 
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consistency). Applying the error transformation to each voxel location, p{i), and 
taking the modulus, the mean error over the image is: Svie/o I I- 

Registration of Clinical Data and Measurement of Consistency All 

registration was rigid body (6 degrees of freedom) and 5 resolution levels. The 
search interval ranged from 4mm or degrees to 0.01mm or degrees. For all 11 
subjects, the first image (baseline) was registered to the second, the second to the 
third and the third to the first, giving 33 transformations for unsegmented im- 
ages and 33 for images where the target was segmented. A set of 66 registrations 
was performed with each similarity measure. The consistency of 33 transforma- 
tion estimates obtained without segmentation and 33 with segmentation was 
calculated. The triangular (internal) consistency for 11 measurements with seg- 
mentation and 11 without were also determined. Each consistency measurement 
was expressed as the mean, RMS, and maximum brain voxel shift (^m). 



Assessment of Difference Images from Clinical Data Three sets of differ- 
ence images, derived from different groups of subjects, were used during assess- 
ment: the first was used to train radiologists, the second to test their abilities at 
detecting misregistration, and the third for assessment of misregistration differ- 
ences between data registered with or without prior segmentation. For training, 
difference images were created with varying amounts of misregistration . For 
testing radiologist’s ability to detect misregistration the two consecutive scans 
of the normal subject were used to eliminate the possibility of any anatomical 
change in subject or scanner calibration. The second image was registered to the 
first by maximising normalised mutual information m and transformed into 
the coordinate frame of the first by sine interpolation (radius 6). The first im- 
age was then subtracted from the aligned second one to produce a difference 
image which corresponded to no added misregistration. Ten increasing amounts 
of misregistration were added synthetically by calculating successively scaled 
down versions of the original 6D transformation (corresponding to mean voxel 
shifts 50 — 500/im in 50/im steps). Difference images for the clinical study were 
produced by registering the second and third images to the first by maximising 
normalised mutual information (as above). For each subject the second and third 
images were then transformed into the coordinate frame of the first (as above) 
and the first image was subtracted to produce two difference images (2 — 1 and 
3—1) from registration with segmented data and two from registration with 
unsegmented data. Radiologists were trained to recognise different amounts of 
misregistration using the training set. Then they rated the misregistration of 
each randomised difference image on a 7 point scale. 



Registration of Simulated Data We used the noiseless brain model image 
and the noiseless brain model with 40% RF inhomogeneity from McGill Uni- 
versity 121 to create four image pairs: (a) 2 identical images; (b) 2 images with 
added noise; (c) 1 noiseless image with RF inhomogeneity and one without; (d) 
2 images with added noise 1 with, 1 without RF inhomogeneity. 
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Fig. 1. Axial planes throngh clinical images: non-segmented (left), segmented, differ- 
ence, McGill with simulated Rician noise (right) 



3 Results 

Three sets of results are given: (1) Consistency measurements for transformation 
estimates from registration of clinical data (segmented and not segmented) for 
the 7 measures. (2) Scores from radiologists’ visual assessment of clinical differ- 
ence images (segmented and not segmented). (3) Consistency measurements of 
the transformation estimates obtained from registration of the simulated data 
with the 7 measures. All consistency measurements correspond to the mean, 
RMS, and maximum voxel displacements over the segmented brain region and 
are given in /im, rounded to the nearest /im. 



Registration Consistency for the 7 Similarity Measures The mean / 
standard deviation of 33 measurements of the mean voxel shift (/im) for regis- 
tration solutions with and without segmentation of clinical data were: 122/46 
(mi), 121/48 (nmi), 164/74 (ncc), 175/76 (chi), 8429/5316 (edi), 700/1503 (pi), 
880/609 (riu). The smallest mean/standard deviation of 33 measurements of 
the maximum voxel shift were 223/96 (mi), 222/96 (nmi). Tabled shows aver- 
aged consistency measurements for T 31 T 23 T 12 with each of the 7 measures, for 
segmented and unsegmented data. 



Visual Assessment of Difference Images Assessed misregistration was cor- 
related with the added misregistration. For observer A the Spearman rank cor- 
relation coefficient (p) was 0.96 for observer B, p was 0.79. Inter-observer agree- 
ment was also tested and p was 0.85. These results suggest that radiologists are 
sensitive to misregistration in difference images corresponding to a mean, RMS 
and maximum voxel shift, over the brain, of: 195, 199, 299 /tm respectively. 
There was no perceived difference in perceived misregistration with and without 
segmentation using the nmi measure (p=0.35). 



Registration Consistency with Simulated Images Registration accuracy 
was measured by comparing the transformation estimate with the identity using 
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6 different starting estimates. The mean/standard deviation of voxel shift for 
the 6 starting transformations for the four image pairs was (/im): (a) less than 
10/3 for all measures; (b) 127/2 (chi), 135/3 (edi), 121/2 (mi), 126/2 (ncc), 
137/3 (nmi), 343/23 (pi), 416/74 (riu); (c) 8/2 (chi); 39/14 (edi); 51/1 (mi); 
25/5 (ncc); 43/6 (nmi); 12/1 (pi); 30/7 (riu). There was a failure for riu which 
was omitted; (d) 203/3 (chi); 253/2 (edi); 163/5 (mi); 133/2 (ncc); 194/7 (nmi); 
391/26 (pi); 402/57 (riu). 



Table 1. Mean (standard deviation) of 11 consistency measurements (T31T23T12). 
Registration without prior segmentation (left) and with prior segmentation (right) 





unsegmented data 


segmented data 


measure 


mean 


RMS 


max 


mean 


RMS 


max 


chi 


91 (28) 


96 (31) 


165 (55) 


99 (31) 


104 (33) 


169 (62) 


edi 


1757 (1148) 


1780 (1198) 


2160 (2072) 


66 (35) 


69 (36) 


117 (55) 


mi 


88 (23) 


94 (26) 


168 (53) 


78 (29) 


82 (31) 


139 (59) 


ncc 


87 (37) 


93 (40) 


168 (73) 


97 (25) 


101 (27) 


168 (63) 


nmi 


86 (32) 


92 (35) 


162 (70) 


78 (29) 


81 (30) 


133 (57) 


pi 


1565 (2204) 


1690 (2391) 


3348 (4809) 


145 (94) 


154 (100) 


278 (181) 


riu 


1221 (553) 


1276 (549) 


2100 (748) 


258 (92) 


276 (101) 


531 (224) 



4 Discussion and Conclusion 

Tabled shows that 4 of the 7 measures produced transformation estimates that 
were consistent to within 331 /rm whether or not the data was pre-segmented 
and also had the best internal consistency (T31T23T12) for non-segmented data. 
For the joint entropy measures the mean of the maximum inconsistency between 
registrations with and without segmentation was 223 fim. The results from vi- 
sual assessment of synthetically misregistered data indicated that the threshold 
for detecting misregistration corresponded to a mean and maximum inconsis- 
tency of about 200 and 300 fim respectively. These inconsistencies are larger 
than the averaged measured mean and maximum inconsistency suggesting that 
these inconsistencies are too small to be reliably detected by the visual inspec- 
tion of difference images. For non-segmented data, there was little difference in 
the internal consistency of transformation estimates for those measures based 
on correlation (chi and ncc) and for those based on joint entropy (mi and nmi). 
However, registration results with and without prior segmentation were more 
self-consistent for those based on joint entropy. Results with the simulated im- 
ages suggested that image noise had a significant effect on registration accuracy. 
However, the highest resolution matching was done with images at the original 
resolution without any filtering to reduce the impact of noise. It is possible that 
low pass filtering with intensity thresholding might improve performance of some 
measures. Our results show that the similarity measures based on mutual infor- 
mation are the most suitable for rigid body registration of serial MR images of 
the head. Using our optimisation strategy we achieve registration solutions with 
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and without extra-dural tissue segmentation that are consistent to within the 
threshold of observer discernibility (i.e. 200-300 /tm). Our results apply under 
the conditions of typical scalp deformations and small scale anatomical change. 
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Abstract. Freehand 3D ultrasound imaging produces a set of irregu- 
larly spaced B-scans, which are typically reconstructed on a regular grid 
for visualisation and data analysis. Most standard reconstruction algo- 
rithms are designed to minimise computational requirements and do not 
exploit the underlying shape of the data. We investigate whether approxi- 
mation with splines holds any promise as a better reconstruction method. 
A radial basis function approximation method is implemented and com- 
pared with three standard methods. While the radial basis approach is 
computationally expensive, it produces accurate reconstructions without 
the kind of visible artifacts common with the standard methods. 



1 Introduction 

In freehand 3D ultrasound, a position sensor is attached to a conventional ul- 
trasound probe and a set of 2D B-scans are acquired, along with their relative 
locations. This allows the irregularly spaced B-scans to be reconstructed into a 
regular 3D voxel array for visualisation. The reconstruction step is important: 
any loss of image quality, or the introduction of artifacts, should be avoided. 

The literature reveals several reconstruction methods, which are all rather 
simple because they were designed to minimise the time and memory require- 
ments. The most common methods are voxel nearest neighbour (VNN), pixel 
nearest neighbour (PNN) and distance-weighted (DW) interpolation. 

VNN interpolation is easy to understand: each voxel is assigned the value 
of the nearest B-scan pixel jn|. There are no parameters to set. In common with 
the other reconstruction techniques, reconstruction artifacts can be observed 
in slices through the voxel array, since the interpolated image is a collage of 
projections from the intersected B-scans. Registration errors, including tissue 
motion and sensor errors, contribute to slight misalignment of the B-scans. This 
results in mismatches among the neighbouring pieces of the collage. The lines of 
intersection between the pieces then become visible - see Fig. I3a). 

The two-stage PNN algorithm is the most popular reconstruction method 0. 
In the first stage (bin- filling), the algorithm runs through each pixel in every B- 
scan and fills the nearest voxel with the value of that pixel. Multiple contributions 
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Fig. 1. Reconstruction artifacts. In (a), VNN interpolation is used to reconstruct 
an examination of a neck muscle. In (b), PNN interpolation is used for an examination 
of the thyroid. In (c), DW interpolation is used for an examination of the bladder 



to the same voxel are usually averaged. The parameters to set at this stage are 
the weights on the multiple contributions. The second stage (hole-filling) fills 
any remaining gaps in the voxel array. A variety of hole-filling methods have 
been used, including averaging of filled voxels in a local neighbourhood The 
parameters to set at this stage are the weights of the voxels used to fill the gaps. 
Artifacts can be generated by this two stage process: a slice passing through 
both first stage and second stage filled voxels may show the boundary between 
the bin-filled regions and the smoothed hole-filled regions — see Fig.^b). 

Like VNN, DW interpolation proceeds voxel by voxel. Instead of using the 
nearest pixel, each voxel is assigned the weighted average of some set of pixels 
from nearby B-scans. The parameters to choose are the weight function and the 
size and shape of the neighbourhood. The simplest approach employs a spherical 
neighbourhood of radius r^ax around each voxel P . All pixels in the sphere are 
weighted by the inverse distance to the voxel and then averaged. If is too 
small, gaps may result, as in Fig. ^c). Yet if r^ax is too large, the voxel array 
will be highly smoothed, since the effect of the weighting is quickly swamped by 
the larger number of data points falling into the larger local neighbourhood. 

2 Radial Basis Function Interpolation 

There have been no previously published attempts at functional interpolation 
of freehand 3D ultrasound data, since there are severe computational demands 
to overcome. After surveying recent advancements in trivariate interpolation of 
large data sets, a method was discovered that ideally suits the freehand 3D 
ultrasound reconstruction problem. This method was developed by researchers 
at the University of Illinois for interpolation of multivariate geographical data 
sets 0. They dubbed the method “completely regularized splines with tension”. 

Consider a set of pixel values pj {j = 1. . .N) located at positions = 
Zj) with respect to the voxel array. The goal is to find a spline S'(x) that 
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passes as close as possible to the data points and is as smooth as possible. These 
two requirements can be combined in such a way that S'(x) fulfills 

N 

\pj — S'(xj)|^ + wI{S) = minimum. (1) 

i=i 

The first component is the deviation of the spline from the data points, and the 
second is a smoothness function I{S). The weight w determines the relative cost 
of the two components. The solution can be expressed as 



N 

S'(x) = T{x) +'^ajR{x,Xj) 
t=i 



(2) 



where T(x) is the trend function, Uj are scalar coefficients, and R(x,Xj) is an 
RBF (radial basis function) whose form depends on the choice of I{S). 

For the 2D case, if I{S) is chosen to minimise the cost of the second deriva- 
tives, the familiar thin plate spline results. If the same I{S) is used for the 3D 
case, the first derivatives of the RBF become divergent at the data points. By 
choosing a more general I{S), we obtain an analytic expression for the RBF with 
regular derivatives of all orders | 2 |. This results in T(x) = oq, a constant, and 



R(x,xj) 



£ 
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( 3 ) 



where r = |x — Xj| is the distance from x to Xj, and erf is the error function. 
The parameter ^ is a generalised tension parameter, controlling the distance 
over which the point influences the resulting hypersurface. The multiplicative 
constant (j)^ j Air can be omitted, since it can be combined with the coefficients 
Qj . The spline coefficients can then be found by solving the set of linear equations 



N N 

ao + aj [R(x, Xj) -I- Sijw] = Pi for * = 1 . . . m and Uj = 0 (4) 

i=i i=i 

where Sij is the Kronecker delta function. There are two parameters to set: (j) 
controls the tension, and w controls the level of approximation. The goal of 
tuning the parameters is to find the optimal balance between the requirements 
of obtaining small deviations from the data points and avoiding overshoots. 

For computational efficiency, the RBF interpolant cannot be calculated using 
all the data points of an ultrasound examination at once: the input data must 
be divided into manageable segments. Individual interpolating functions are 
then calculated for each segment. To ensure smooth connections among the 
RBF’s of neighbouring segments, overlapping windows are used. A window is 
established around each segment in such a way that it encompasses not only all 
the data points in the segment but also a sufficient number of neighbouring data 
points. All data points in the window are then used to calculate the RBF’s for 
that segment. Since the windows overlap each other, the RBF for each segment 
will closely match the neighbouring RBF’s. Full details of a novel windowing 
technique suitable for use with freehand 3D ultrasound data can be found in |5|. 
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3 Comparison of Reconstruction Methods 

A fan-shaped sweep of a human bladder was performed in vivo with a 3.75 MHz 
curvilinear array probe (0.34 mm wide pixels). Since the true anatomical function 
is unknown, the reconstruction methods were tested by artificially removing data 
from the examination, and then evaluating their ability to predict the intensity 
values in the gaps. First, a B-scan near the middle of the sweep was selected. 
The voxel array (with voxels equal in size to the pixels) was aligned with this 
B-scan so that pixels fell exactly onto voxels. A percentage of the pixels were 
then removed randomly from the B-scan, creating gaps of various sizes. The rest 
of the pixels and all other B-scans were used in the interpolation to fill the voxel 
array. The values of the removed (original) pixels could then be compared with 
the values of the voxels aligned with them, and an average error computed: 

1 ^ 
i=l 

where pi is the original pixel that was removed from the reconstruction, Ci is the 
interpolated value of the voxel aligned with pi and M is the number of removed 
pixels. A low value of V indicates a good ability to interpolate over the gaps. 

The tests were performed with eight different percentages of removed data: 
0%, 25%, 50%, 75%, 100%, 300%, 500% and 700%. For the 25% to 100% tests, 
pixels were removed only from the selected B-scan n. The 300% test removed all 
of B-scan n and all of B-scans n-1 and n-l-1. The 500% and 700% tests removed 
B-scans n±2 and n±3 as well. The 0% test was included because a reconstruction 
method may not replicate the original data points. For the 0% test alone, V was 
calculated over all pixels of the selected B-scan. The eight tests were repeated 
for ten different B-scans to give mean and variance estimates of V. 

Typical algorithms were implemented in each of the conventional reconstruc- 
tion categories and compared with the new RBF method. The hole-filling stage 
of the PNN algorithm used the average of the filled voxels in a 3 x 3 x 3 neighbour- 
hood. The remaining unfilled voxels were then filled by averaging originally filled 
voxels in a 5 X 5 X 5 neighbourhood and so on, until all voxels were filled. This is 
similar to the method described in P]. The DW method was implemented with 
an inverse distance weight within a spherical neighbourhood. Vmax was set to 
the smallest value which avoided gaps in the reconstructions. The RBF method 
used the windowing technique described in |S| . Each segment contained at most 
30 data points. The tension and approximation parameters were tuned manually 
by viewing a slice of the voxel array. A low tension (^ = 25) combined with a 
small amount of smoothing (w = 0.1) gave optimal results. These values fall 
within the range typically used for geographic data interpolation P|. 

4 Results 

The results are tabulated in Tabled examples of the interpolated images can be 
found in d . A second experiment with different data produced similar results |S| . 
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Table 1. Interpolation error V. fi is the mean of V and a is the standard deviation, 
f means that the assertion /i > Hhbf is statistically significant for a confidence level of 
0.05. ★ means that the assertion /i < ^rbf is statistically significant for a confidence 
level of 0.05. The assertions are tested with the paired-sample t-test statistical method 



Test 


VNN 
fi a 


PNN 

(7 


DW 

(7 


RBF 

a 


0% 


0.00* 


0.00 


0.00* 


0.00 


0.00* 


0.00 


0.96 


0.03 


25% 


5.60l‘ 


0.39 


5.01I 


0.22 


5.371 


0.09 


3.57 


0.25 


50% 


5.501' 


0.40 


5.081 


0.25 


5.321 


0.09 


3.85 


0.31 


75% 


5.27I 


0.50 


5.I9I 


0.35 


5.241 


0.10 


4.13 


0.40 


100% 


4.13 


0.38 


5.25I 


0.40 


5.1F 


0.14 


4.29 


0.37 


300% 


6.92 


0.40 


7.O3I 


0.15 


6.851 


0.12 


6.69 


0.19 


500% 


8.501* 


0.23 


7.8OI 


0.14 


7.62* 


0.11 


7.73 


0.16 


700% 


9.371 


0.26 


8.36 


0.18 


8.07* 


0.09 


8.37 


0.16 



The VNN method produced sharp, detailed reconstructions. At 25%, 50% 
and 75%, the nearest neighbours of the voxels came mainly from the remaining 
pixels of the selected B-scan. Therefore, the interpolated image appeared as a 
patchwork of irregularly shaped pieces and relatively large values of V result. For 
the 100% to 700% tests, the interpolated image was formed from the projection 
of pixels from the nearest B-scans. The join lines between the portions of the 
projected data were indiscernible, suggesting that registration errors were small 
and the images varied slowly from one B-scan to the next. 

The PNN method produced more blurred reconstructions. At 25%, 50% and 
75%, the gaps were filled mainly by averaging the remaining pixels in the original 
B-scan. The interpolated image appeared as a patchwork again, with relatively 
large values of V. The mean of V increases progressively for the 100% to 700% 
tests. The reconstructions exhibited significant artifacts, especially for the 500% 
and 700% tests. Visible boundaries were evident between portions filled, for 
example, using a 7 x 7 x 7 neighbourhood, and portions filled using a 9 x 9 x 9 
neighbourhood, because they involve different amounts of smoothing. 

The DW reconstructions also exhibited artifacts. At 25%, 50% and 75%, the 
reconstructions comprised voxels filled by the original data (weighted by infinity), 
along with voxels in the gaps that were calculated from a weighted average of 
neighbouring pixels. The interpolated image was therefore a combination of the 
original pixels and smoothed data in the gaps. Apart from progressive blurring 
as more data was removed, no other artifacts were apparent. 

The RBF technique performed marginally better than the others. At 25%, 
50% and 75%, the mean of V is considerably lower than the other methods and 
the resulting interpolated data appeared the most detailed and least artificial. 
This demonstrates the ability of a functional method to use the shape of the 
underlying data to interpolate across the gaps. Yet at percentages of 100% and 
greater, the RBF is not always significantly better than the other methods. One 
of the reasons for this is that the underlying shape of the anatomical data is lost 
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when the gaps become too large. Another problem is that the RBF approaches 
the trend term of the interpolation function in the largest gaps. In general, the 
RBF method produced no visible artifacts in the interpolated data, apart from 
progressive blurring as the percentage of removed data increased. 

The performance of the RBF was largely unchanged for tensions (j) in the 
range 10 to 25 and smoothing w in the range 0.01 to 0.1. A potential improvement 
lies in the use of anisotropic tension 0, which should be high within the B-scans 
to avoid overshoots, and low orthogonal to the B-scans to fill the gaps. This would 
reduce the blurring in the gaps between B-scans. 

The major disadvantage of the RBF technique is its considerable computa- 
tional expense. However, the segmentation of the voxel array means the RBF 
method is amenable to parallel processing. Since many modern ultrasound ma- 
chines already have the capacity for parallel processing (the Toshiba Powervision 
7000 used for these examinations contains more than 60 Pentium processors), a 
practical implementation of the RBF method is not infeasible. 

5 Conclusions 

The RBF method performs better than the traditional reconstruction techniques, 
though not remarkably so. However, many opportunities exist to exploit the 
unique properties of the RBF method. For example, derivatives can be calculated 
directly from the RBF’s. Accurate derivatives are often required in applications 
such as visualisation, registration and segmentation. A functional representation 
can also be useful for data compression and filtering. Also, since an approximat- 
ing function in general misses the data points, the distance it misses them by can 
be considered the predictive error. A large predictive error may be indicative of 
image misalignment, so determining which regions have large predictive errors 
can be useful for investigations into registration errors. 
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Abstract. Approximate entropy (ApEn) is a computable measure of 
sequential irregularity that is applicable to sequences of numbers of finite 
length. As such, it may be used to determine how random a sequence of 
numbers is. We exploit this property to determine the relevance of image 
information; to determine whether a spatial signal intensity distribution 
varies in a regular fashion — and is therefore likely to be an image feature 
or image texture, or is highly random — and likely to be noise. We 
present an outline of two possible methodologies for creating an ApEn- 
based noise filter: a modified median filter and a modified anisotropic 
diffusion scheme. We show that both approaches lead to effective noise 
rednction in MR images, with improved information-retaining properties 
when compared with their conventional connterparts. 



1 Introduction 

Nonlinear geometric schemes provide elegant methods for smoothing digital im- 
ages. Anisotropic diffusion [Q and its subsequent developments (see e.g. [21, 'I| 1 
use the magnitude of local intensity gradients to determine object edges to be 
preserved in preference to less significant gradients (assumed to be noise or 
structures of little interest), which are smoothed. An alternative method for 
identifying significant image information may be to determine whether the local 
spatial intensity distribution is ordered or random. We have investigated the use 
of approximate entropy (ApEn) [4l,^lf)j . a finite computable measure of sequential 
irregularity which is applicable to short sequences of numbers to determine local 
pixel intensity regularity. We investigate using ApEn to determine whether spa- 
tially fluctuating signal contains a degree of regularity — and is therefore likely 
to be an image feature or texture, or is highly random — and likely to be noise. 
From this, we construct effective noise reduction filters, which retain improved 
levels of detail when compared with existing methods. 
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2 Theory and Application 



To summarise the mathematical definition of ApEn |1SE|: given an array of 
size N and an integer m, under the conditions 0 < m < N, a, sequence of real 
numbers u: = (u(l), u(2), . . . ,u{N)), and a real number r (where r > 0), let 
the distance between two sub-sequences x{i) = {u{i),u{i -|- 1), . . . , u{i -I- to — 1)) 
and x{j) = {u{j),u{j + 1),... ,u(j + m — 1)), be defined as d{x{i) , x{j)) = 
maxp=i^ 2 ,... + p — 1) — u{j +p— 1)|). Then let CJ^(r) = {number of j < 

(A^ — TO -I- 1) such that d(x(i),x(j)) < r}/(N — m+ 1). Now define 



<P^(r) = 



1 

— TO -I- 1 









ApEn{in, r, N)(u) = <E^{r) — . 



( 1 ) 

(2) 



ApEn{m^r, N){u) may be interpreted as a measure of the maximum frequency 
at which number sequences within u of length to occur compared with sequences 
of length TO-|- 1. High values of ApEn imply randomness; low values imply order. 

We hypothesise that ApEn may be used to distinguish useful image informa- 
tion (edges, textures) from noise. We modify median and anisotropic diffusion 
schemes using the ApEn value derived from a local neighbourhood in a weight- 
ing function for existing smoothing schemes. We reduce smoothing when ApEn 
is low and allow smoothing when it is high. 

All ApEn calculations use the following parameters: A^ = 25 (a 5 x 5 neigh- 
bourhood), TO = 1; r = 0. As the above definition of ApEn is for ID sequences, 
we treat the intensities within the neighbourhood as a ID raster array of size N 
for ApEn calculation. To minimise directional bias, the mean ApEn is calculated 
from two ID arrays, with data entered up/down and then left/right. 



2.1 ApEn Median Filter 

The transformation of an image, k, with pixel intensity Ik{x,y) to the modified 
median-filtered image with intensity Ik+i{x,y) is given by 

Ik+i{x,y) = ApEnk(x,y)iocaiM{x,y)iocai + (1 - ApEnkix,y)iocai}h{x,y ) , 

(3) 

where ApEnk{x,y) local = ApEn{m,r, N){u) calculated within the y/N x y/N 
neighbourhood centred at (x, y) of the kth. image and normalised over the whole 
of the fcth image; M(x, y)iocai is the median intensity within the neighbourhood. 

Figures [D)a-d) show a comparison of the effects of the conventional and 
ApAn-modified median filter. The modified filter produces sharper edges and 
better preservation of detail due to ApEn being low (so restricting smoothing) in 
regions dominated by structural information (e.g. tissue interfaces) and high (so 
allowing smoothing) in regions of relatively constant mean intensity corrupted 
by random noise (Fig.^Jb)). 
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Fig. 1. (a) MR image with added Gaussian noise (SD=5). (b) ApEn distribution, 
(c-f) image (a) after application of: (c) median filter (5x5 mask; 1 iteration); (d) 
ApFn-modulated median filter (5x5 mask; 3 iterations); (e) anisotropic diffusion 
(2 iterations); (f) ApFn-modulated anisotropic diffusion (3 iterations). Mean noise 
reductions: (c) 71 %; (d) 74 %; (e) 58 %; (f) 62 % 



2.2 ApEn Anisotropic Diffusion 

A formulation of the 2D edge-affected anisotropic diffusion scheme is P 

=dtv[g{\\WI\\)WI], (4) 

.(l|V/||) = e-(^r, (5) 

where t is an artificial time parameter, V/ is the local intensity gradient, 5 (|| V/||) 
is an ‘edge-stopping’ function, and K, the Canny noise estimator [Z], is set to 
85 %. This is applied as an explicit Euler forward scheme 0, with a regular- 
ising scale of 0.8 0 and a time step of 0.25. We modify Q by introducing 
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Noise Level (Arb. Units) 

Fig. 2. Correlation coefficient (CC) as a function of noise levels (arbitrary units). 
Squares', anisotropic diffusion. Crosses'. ApFn-modified anisotropic diffusion 



ApEn{x,y) local as a modulating term (a similar approach to 0), to give G: 

G{\\\'I\\,ApEn{x,y)iocai) = e 1 ^ > ApEn{x,y) local , (6) 

where b is determined empirically. We use b = 1, causing G to be reduced at low 
ApEn values (ordered case) relative to high ApEn values (disordered case). 

Figures Q (e) and (f) show a comparison of standard and ApEn-ioaodified 
anisotropic diffusion (the differences are less than in Figs. Qc) and (d), as 
anisotropic diffusion filters generally out-perform median filters). The unmodi- 
fied filter reduces noise more than the modified scheme for a given number of 
iterations, due to the retarding effect of the ApEn modulation, but retains less 
detail for a given noise reduction. Further differences are seen by examining how 
useful image information is preserved through iterative filtering after the appli- 
cation of Gaussian noise. We do this by calculating the correlation coefficient 
between the filtered images and the original noise-free image jH3 (Fig. ED. The 
ApEn-modided scheme retains more information for a given noise reduction, up 
to what appears to be a stopping point (from experiments to date) . The unmod- 
ified scheme progresses to further noise reduction at the expense of information 
in the image as a whole. 

ApEn reduces the smoothing occurring to textural features which, although 
they do not possess well-defined edges, may represent important image informa- 
tion (Fig. ED- More uniform regions experience approximately the same degree 
of smoothing by both the modified and unmodified schemes. 

3 Conclusions 

We have estimated ApEn within small image neighbourhoods, and shown differ- 
ences between regions dominated by structural information and by noise (Fig. 
n^b)). ApEn is then used to reduce noise by modulating the effects of existing 
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(a) (b) (c) 



Fig. 3. (a) Original image, (b) Smoothed with anisotropic diffusion filter (5 iterations), 
(c) Smoothed with modified anisotropic diffusion filter (5 iterations) 



schemes. The modified schemes are weighted towards removing spatially random 
signal (noise), whilst retaining more orderly information. 

ApEn differs from conventional concepts of entropy El as its calculation 
includes steps concerning the relationship between neighbouring (or higher or- 
der distances, dependent upon the choice of m) intensity values, and how often 
these relationships occur, rather that relying upon a statistical description of 
the histogram of all values within the region of interest. It is this point that also 
makes it distinct from measures such as intensity variance. We have made pre- 
liminary comparisons with filters based on the local 2D autocorrelation (adapted 
from ini) of pixel values, and found that the information retaining properties 
of using ApEn are superior (data not shown). 

The diffusion time (or number of iterations) required for a given noise reduc- 
tion is increased by ApEn modulation, due to its retarding effect (see 021) and 
©)• We normalise ApEn{x, y) over the whole image, so with 6=1 (see ( 0 )), the 
smoothing at each step is less than or equal to that possible with the unmodified 
schemes, and the stability of the original scheme 0 is not compromised. Less 
image information is lost for a given noise reduction when incorporating ApEn. 

The parameters used to calculate ApEn, and for weighting the modulation 
(see (PI) and (jSl)), were chosen for effectiveness of noise suppression, feature 
preservation, and ease of computation. The time for computation of ApEn scales 
as « 2m{N — m + 1)^. We used small N, allowing relatively quick computation 
and spatially localised neighbourhoods. However, there is a statistical advantage 
in larger N [0|, implying increased accuracy in calculated values of ApEn. The 
effects of using alternative values of 6, m, N, and r are left to future work. 

The current pseudo-2D local ApEn calculation may potentially be developed 
to a true 2D (and any other dimensionality) vector calculation of ApEn, as sug- 
gested by Singer and Pincus ini. However, initial experiments (data not shown 
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here) suggest that our pseudo-2D approach is effectively rotation-invariant, jus- 
tifying the application of ID ApEn calculations in a 2D setting. 

We have presented a novel framework for noise reduction in medical images. 
We have applied our techniques to many medical and synthetic images (not 
shown here) and have found that they consistently out-perform the unmodified 
schemes. The techniques presented may have application to a range of MRI tech- 
niques including quantitative studies such as functional MRI, perfusion imaging, 
and diffusion tensor imaging, each of which typically suffers from low signal-to- 
noise ratios. The capability of ApEn to distinguish noise from image structure 
may also make it a suitable candidate for texture preserving filtering tasks. 
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Abstract. One of the major drawbacks of Magnetic Resonance Imaging 
(MRI) has been the lack of a standard and quantifiable interpretation 
of image intensities. This causes many difficulties in image display and 
analysis. We have devised a two-step method wherein all images can be 
transformed in such a way that for the same protocol and body region, in 
the transformed images similar intensities will have similar tissue mean- 
ing. Normalized images can be displayed with hxed windows without the 
need of per case adjustment. More importantly, extraction of quantitative 
information about healthy organs or about abnormities, such as tumors, 
can considerably be simplified. This paper introduces and compares new 
variants of this normalization method that can help to overcome some 
of the problems with the original method. 



1 Introduction 

A variety of MRI protocols (for example pulse sequences) are currently avail- 
able that allow the setting up of different contrasts among the different tissues 
within the same organ system. Unfortunately, one of the major difficulties with 
the MRI techniques has been that intensities do not have a fixed meaning, not 
even within the same protocol for the same body region obtained on the same 
scanner for the same patient. This implies that MR images cannot be displayed 
at preset windows; one always has to adjust the window settings per case. The 
lack of a meaning for intensities also poses problems in image segmentation and 
quantification. What we need is that for protocols that are the same or “close” 
to each other, the resulting images should also be “close” . 

Attempts have been made to calibrate MR signal characteristics at the time 
of acquisition using phantoms. Postprocessing techniques that are applied to the 
image data that do not have any special acquisition requirements are however 
more attractive. There does not seem to have been any serious attempt to address 
this problem in the past. 

The method described in offers a simple way of transforming the images 
so that there is a significant gain in similarity of the resulting images. It is a 
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two-step method consisting of a training step (executed only once for each pro- 
tocol and body region) and a transformation step (executed on each image). 
This new transformation results in standard scales for different protocols and 
body regions. Intensities in the transformed images have meanings, and standard 
window settings can be determined for different tissues. However, the original, 
mode-based method is often not appropriate if the application is image segmen- 
tation, where we need more accurate meaning on the normalized scale even for 
relatively small ranges. This paper introduces and compares new variants of this 
normalization method that can help to overcome some of the problems with the 
original method. 



2 Methods 

Overview of the Normalization Method. We consider an image as a 3 - 
dimensional array of volume elements (voxels) with intensity values assigned to 
each voxel. We assume that all “valid” intensities are positive integers and the 
value 0 means “no measured data” . We denote the minimum and maximum 
occuring intensities in an image by mi and m2, respectively. 

It is desirable to cut off the “tails” of the histogram of the image because 
they often cause problems. Usually the high intensity tail corresponds to artifacts 
and outlier intensities. With this in mind, let pc\ and pc2 denote the minimum 
and maximum percentile values that are used to select a range of intensity of 
interest (lOI). Let the actual intensity values corresponding to pc\ and pc2 in 
the histogram be p\ and P2- 

Based on over 20 body region/protocol combinations, we have observed 
mainly two types of histograms among MR images: unimodal and bimodal. In 
case of bimodal histograms, we can usually use the mode (/i) that corresponds 
to the main foreground object in the image as a histogram landmark. With uni- 
modal histograms the mode usually corresponds to the background so we need to 
select some other landmark. This may be for example the shoulder of the hump 
of the background intensities. Since most of the protocols we studied produce 
bimodal histograms we will describe this case in more detail. 

Our overall approach is as follows. Let the minimum and the maximum inten- 
sities on the standard scale for the lOI be Si and S2, respectively. In the training 
step, the landmarks {plj^ P2j, Mi) obtained from each of a set of images are 
mapped to the normalized scale by mapping the intensities from [pij,P2j] onto 
[si,S2] linearly. Then the mean (ps) of these mapped pjS is computed. In the 
transformation step, for any given image, the actual second mode pi obtained 
from its histogram is matched to ps by doing two separate linear mappings: the 
first from [pu,Pi] to [si,Ms] and the second from [pi,P2i] to [ms>S2]- 



Choosing the Standardization Parameters. Although, once the training 
step is done, the corresponding transformation step is fully determined, there 
are several possibilities to tailor the normalization to the specific needs of an 
application. For example, Si should not be 0 if the values below pi need to be 
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distinguished from “nothing” (i.e., value 0). Further, S2 — Si should be large 
enough not to merge neighboring intensities after the transformation. When 
pci > 0 and/or pc2 < 100, the values in [mii,pii] and [p2i, W2i] may be mapped 
to si] and [s2, S2j], respectively, where s'^ and s^i are determined by applying 
the mapping in the two linear sections corresponding to [pu^pi] to [si,/is] and 
[PiiP2i] to [/is, 52]- We refer to this scale as “open”. When all intensities in 
[mi,pi] and [p2,'m-2] are mapped to si and S2, respectively, we refer to the scale 
as “closed”. 



Choosing the Landmarks. The choice of the actual landmark is also an im- 
portant factor. The mode-based method described above works fine for several 
MR protocols and several body regions but there are cases (and applications) 
wherein this simple method is not appropriate. As an example, consider the 
shape of the gray matter (GM), white matter (WM), and CSF distributions in 
fast spin-echo (FSE) proton density (PD) brain images. Their relative locations 
vary among studies and even among studies of the same patient. Figure Q shows 
some histogram shapes all of which were found in histograms of patient stud- 
ies. We recall here that in FSE PD images, GM regions are brighter than WM 
regions. The weakness of the mode-based method is that sometimes the mode 
(the peak location) corresponds to GM intensity (Figs. QJi, Od), and in other 
cases, it corresponds to WM intenstity (Figs.^, ^), or it may also correspond 
to intensities that lie between real GM and WM (Fig.QJ:). Therefore, when we 
match the mode to a fixed location on the normalized scale, we may match 
GM in some cases and WM in the others. Because of this “switching” behav- 
ior, the mode-based method is often not appropriate if the application is image 
segmentation, where we need more accurate meaning on the normalized scale 
even for relatively small ranges. In order to eliminate the “switching” behavior, 
one approach is to choose the median of the main body of the histogram as a 
landmark to match. We do this on the reduced histogram (i.e., after removing 
the background and the noise (high percentile)). This landmark remains consis- 
tent even in cases where the histogram has two similar peaks (Figs. Gt, GJ) or 
asymmetric shape (Figs. Gl>, Gtl)- We may also use more histogram landmarks, 
such as quartiles and deciles, to better define the standard histogram. 








Fig. 1. Shapes of brain MRI histograms. For clarity, only the main body of the his- 
togram (corresponding to the brain) is shown 
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3 Evaluation 

For the validation of the method for each protocol and body region we need to 
consider the following variations in image data: (i) intra-patient (time-to-time) 
variation, (ii) inter-patient variation, (iii) variations among different machines of 
the same manufacturer, and (iv) variations among machines of different types. 
The following sections describe the methods of evaluation that we used to ex- 
amine how different kinds of variations are affected by the different variants of 
the normalization. For all tests si = 1, S 2 = 4095, pc\ = 0, and pc 2 = 99.8 were 
used with “closed” and “open” scale, mode- and median-based variants, using 
linear segment-by-segment mapping. The training was done by using 10 different 
patient studies of the particular protocol and body region. 

3.1 Qualitative Comparison 

We conducted qualitative comparisons for the following MRI protocols: FSE PD, 
FSE T2, spin-echo (SE) PD, SE T2, T1 with Gadolinium enhancement (TIE), 
and SPGR. 30 studies each of FSE PD, FSE T2 and TIE, and 10 studies each of 
SE PD, SE T2 and SPGR were transformed using the corresponding “trained” 
parameters. Two ways of visual comparisons were made: by displaying at fixed 
gray level window settings before and after transformation, and by displaying 
the binary images obtained at fixed threshold ranges. For lack of space, only the 
former is illustrated below. 

Images in the first row of Fig. 0show a slice from each of three different pa- 
tient studies. They are displayed at the same gray level window that was actually 
set up for the first image. This window is not appropriate for the other two data 
sets because they have quite different intensity ranges. In the second row, the 
same slices are displayed, after normalization with the open-scale median-based 
method, at a fixed “standard” brain window that we devised after examining 
a few normalized images. The structures are well portrayed and the contrast is 
more similar than that of the originals. 



3.2 Quantitative Comparison 

Two types of quantitative tests on data sets of brain obtained from three proto- 
cols FSE PD, FSE T2, TIE were conducted. 



Test 1: Intra-patient Variation. We used the same training data sets and 
parameter configurations as for qualitative comparison. The test method for all 
three protocols was the same. Two scans acquired at different time instances were 
randomly selected for 15 patients. The time distance between the two scans of 
the same patient varied between 1 and 6 years. For each patient, we registered 
the first scan to the second via a rigid transformation based on intensity value 
correlations. Because these patients had Multiple Sclerosis (MS), the lesions were 
segmented POI and removed for the purpose of comparison. Without this step 
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Fig. 2. Images displayed at fixed gray level windows. Original FSE PD images from 
different patients (first row), and after normalizing with the open-scale median-based 
method (second row) 



the difference between the images due to the disease whould have perhaps dis- 
torted the results. The similarity of a pair of these registered, lesion-removed 
images was measured by the mean squared intensity difference (normalized to 
the original range of the images) NMSD. This similarity measure was computed 
for every pair of images before and after normalization for the different pa- 
rameter configurations. Table QI shows that the mean value of the NMSD after 
transformation is smaller than that before transformation. It also shows that us- 
ing the median-based normalization the mean of NMSD is further reduced. The 
mean values of NMSD for the pairs of studies were compared using the paired 
t-test. The results show that the change in the means of NMSD is statistically 
significant (mostly p < 0.01) for all three pairs: before and after mode-based, 
before and after median-based, between mode- and median-based. 



Table 1. Mean and standard deviation of NMSD before and after normalization with 
the closed-scale mode-based and with the open-scale median-based method 





FSE PD 


FSE T2 


TIE 


mean 


sd 


mean 


sd 


mean 


sd 


before 


0.0199 


0.0177 


0.0217 


0.0182 


0.0110 


0.0074 


after mode-based 


0.0078 


0.0102 


0.0080 


0.0080 


0.0085 


0.0072 


after median-based 


0.0039 


0.0058 


0.0036 


0.0051 


0.0019 


0.0017 



Test 2: Inter-patient Variation. For this comparison we randomly selected 12 
FSE PD and 12 FSE T2 data sets from our database. All images were previously 
segmented into WM, GM, CSF, and MS lesion (LS) regions. We calculated the 
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statistics over the population of images for each of these regions separately. The 
normalization parameters were the same as those for the other comparisons. For 
each of these regions in each image i in each of these protocols, we calculated the 
normalized mean intensity (NMI) by dividing the mean intensity in the region 
by rri2i — niu. This was repeated for each set of the transformed images wherein 
normalization was done by dividing the mean intensity in the region by S2 — Si. 
The coefficient of variation of the NMI values before and after normalizations 
are shown in Table El The table indicates that the intensities on the normalized 
scale have more consistent tissue meaning than those on the original scale and 
that the median-based normalization outperforms the mode-based method in 
achieving similar tissue meaning of intensities. 



Table 2. Coefficient of variation, expressed in %, of the NMI of different tissues in 
FSE PD and T2 images 





WM 


GM 


CSF 


LS 


PD 


T2 


PD 


T2 


PD 


T2 


PD 


T2 


before 


14.61 


14.83 


51.23 


51.81 


46.77 


46.85 


31.54 


31.61 


after mode-based 


2.55 


2.13 


1.95 


2.18 


3.24 


5.46 


3.20 


5.01 


after median-based 


1.59 


2.53 


1.26 


1.85 


2.97 


5.11 


2.51 


5.04 



4 Concluding Remarks 

The proposed intensity scale normalization methods produce more similar inten- 
sity meanings than the original images based on both qualitative and quantita- 
tive measures. Using the open-scale variant, it is possible to set better intensity 
of interest ranges while still being able to distinguish relevant information at the 
ends of the scale. Intensity values in the transformed images have more consistent 
meanings, tissues have better defined ranges on the median-based normalized 
scale than those on the mode-based normalized scale. Quantitative tests showed 
that the normalized mean squared difference between two different scans of the 
same subject is reduced if the new median-based and open-scale variants are 
used, and that this change is statistically significant. They also showed that the 
inter-patient variation of the intensities within different tissues also decreases. 
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Abstract. A method is presented for determining the intensity map- 
ping between MRI images that may have been acquired using different 
sequences or instruments. The method can be applied to fully elastic 
matching and produces spatially localized probability functions that are 
capable of representing in an efficient way strong intensity distortions 
due, for instance, to the shading effect in MRI. 



1 Introduction 

In image matching, the correspondence between the images is evaluated using 
a similarity measure which quantifies the plausibility of observing an arbitrary 
feature /t in one image when feature ffi is seen in the second image. In this 
work, we present a measure designed for intensity values as they appear on MRI 
images that may have been acquired using different sequences or scanners. The 
measure takes into account partial volume voxels, adapts to spatially varying 
intensity degradations, and is estimated jointly with the unknown mapping that 
warps the two images into spatial register. 

Various similarity measures that utilize statistical properties of the registered 
images have been proposed recently and used with great success to rigidly reg- 
ister multi-modal images of the same scene PJ2E1. Maintz et al. P) have made 
a preliminary attempt to extend the mutual information measure to non-rigid 
registration by using the alignment result from optimizing the measure as a first 
estimate to the elastic correction of small deformations. In related work. Gee 
and co-workers 0 developed a non-rigid matching technique capable of handling 
large-valued deformations, in which the intensity mapping between the images 
is represented as a conditional probability density that is determined simulta- 
neously with the calculation of the unknown spatial transformation. Different 
approaches to estimating this conditional density have been further investigated 
in pj and these are extended in the current work wherein a formal model is con- 
structed that explicitly considers partial volume and position-dependent effects. 

A. Kuba et al. (Eds.): IPMI’99, LNCS 1613, pp. 496-[^^3 1999- 
© Springer- Verlag Berlin Heidelberg 1999 
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Fig. 1. (a) A causal network for the observation model, (b) A voxel V with intensity 
A at position P, composed of more than one tissue T 

2 Methods 

The relationship between the variables of the intensity mapping problem in image 
matching can be better understood when they are displayed as a causal model |7| . 
The model in Fig. specifies that the intensity of each voxel in the image 
depends on the tissue composition within the voxel and on the voxel’s position 
within the scanner volume. This model shows that the same voxel composed of 
a fixed combination V of different tissue types may produce different intensities 
A and B when placed in different locations or scanners P and P' , respectively. 
Fig. Eh illustrates the problem of partial volume tissue composition, in which a 
single voxel V at position P may be composed of more than one tissue type T. 
In this case, the intensity A assigned to the voxel will be a weighted average of 
its component tissue intensities, whose values are a function of position within 
the imaging volume. 

Since the intensity variable A is influenced by the voxel’s partial volume 
mixture V of tissue types and the position P at which the voxel is placed, the 
probability that it assumes a value a is conditioned on the values of V and P. This 
quantitative information is denoted as P(a|u,p), also known as an observation 
or sensor model. We can state the same for intensity observations in the second 
image for the matching problem, where P(b\v,p') is the conditional probability 
that a tissue mixture v produces intensity b at position p'. The goal of this work 
is to determine the conditional probability relating our observation models so 
that it can be used to guide the matching process. 

The relationship between tissue type and partial volume composition is rep- 
resented in the causal model graph by an arrow that conditions the probability 
of V given a pure tissue type T. Since the proportion of tissue types in an ar- 
bitrary voxel can assume any relative combination, the variable V takes on as 
many discrete values. The closer this number is to the number of intensity values 
displayed in the images, the more accurate our representation will be of the par- 
tial volume mixtures. The conditional probability matrix P{v\t) and the prior 
P{t) can be estimated using the intensity histograms obtained from a labeled 
atlas. 



498 A. M. C. Machado, M.F.M. Campos, and J. C. Gee 



During the matching process, for each voxel located at p' of known intensity 
b in image B, the aim is to find its corresponding voxel in image A. To do so, 
it is necessary to determine how likely it is for a voxel with intensity a placed 
at position p to contain the same tissue mixture that the original voxel in B is 
composed of. The problem can be stated as the determination of the probability 
P(a|p, 6, p') that a voxel in image A induces intensity a given that it is placed 
at position p and corresponds to intensity b at position p' in the second image 
B. Conditioning P(a|p,5, p') on the exhaustive set of partial volume mixture 
values, we have that 

P(a|p,6,p') = ^[P(a|p,6,p',z;fc)P(z;fe|p,6,p')]. 

k 

In addition to the dependencies between the variables, the causal model in 
Fig.in also represents the conditional independent relationships between them: 
given that tissue mixture value v is known, the information about variables B 
and P' do not contribute to our belief about the value of A. In other words, A is 
conditionally independent of B and P' given that V is known: P(o|p, 6, p', u) = 
P(a|u,p). Moreover, since A is unknown, it causes the variable V to be inde- 
pendent of P: P(u|p,6, p') = P{v\b,p'). From these independent relationships, 
it follows that 



P{a\p,b,p') = Y^[P{a\vk,p)P(vk\b,p')]. (1) 

k 

Using Bayes’s formula, we have that 

P{vk\b,p') = P{b,p'\vk)P{vk)/P{b,p'), 
which together with ((H) leads to 

P{a\p,b,p') = '^[P{a\p,Vk)P{b,p'\vk)P{vk)]. (2) 

Using the definition of conditional probability, we have that 

P{b, p» = P{b\v, p')P(p», (3) 

where P(p'|t;) = P(pO since P' is independent of V when variable B is unknown. 
From © and © it follows that 

P(a|p, 6, pO = Vk)P{b\p\ Vk)P{vk)]. (4) 

Finally, conditioning P{vk) on the exhaustive set of mixture values, (0 becomes 
P(a|p, b, p') = Yl^P{a\p, Vk)P{b\p\ Vk) J2[P{vk\U)P{U)]]. (5) 
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We see that P(a|p, b, p') is an average of the product of the observation models 



for each partial volume tissue mixture, weighted by its a priori probability. 

In order to use © for MRI image matching, the models P(o|p,'i;) and 
P{b\p',v) are considered to be Gaussian. Specifically, the probability that a 
tissue mixture Vk imaged at position p produces intensity a is described by a 
Gaussian distribution with mean pAkp and variance a\. This model is appro- 
priate for MRI images, but should be replaced with the relevant distribution in 
other imaging situations. Since our probabilities are represented in practice by 
tables, our method is applicable to any class of distributions. Assuming discrete 
Gaussian distributions for the observation models, P(a|p, 6,p') becomes 



where M is the number of intensity values. 

Since the variance can be assumed spatially constant and is easily deter- 
mined from the image background, the only unknowns to be computed are the 
mean values /iAfep and fiskp'- These values can be determined with the aid of 
labeled images. Taking observation model A as an example, the value of fiAkp 
for each position p and tissue mixture Vk can be approximated by considering 
the intensity values that each tissue assumes in the neighborhood of position p. 
Based on the tissue histogram computed for the region around position p and 
on a prior distribution model for the tissue types with respect to the particular 
acquisition protocol, the expected distribution of partial volume tissue mixtures 
for the region can be determined. The tissue mixture distribution Hy and the 
intensity histogram Hj are then matched to determine the mean intensity of the 
mixture in the region around p. The purpose is to determine a function F{vk) 
that will indicate the corresponding intensity for each value Vk, so that 



To compute the local tissue mixture mean intensities and the probability 
P(a|p, 6, p'), a prior model for the tissue mixtures is required. Since a single 
image does not provide sufficient information to infer the partial volume tissue 
composition of any voxel, the global intensity histogram for a labeled atlas is 
used to estimate the prior distribution. The idea stems from the fact that in 
the process of labeling the atlas into its major tissue components the expert 
assigns a partial volume voxel to the pure tissue type that is most representative 
of the voxel’s contents. This then is reflected in the variance of the intensity 
histogram, from which a probability distribution can be obtained. In this work, 
the distribution was approximated by a Gaussian model, although other models 
can be used as well. The method has proven to be robust to this assumption in 
the case when both images are acquired with the same protocol. Based on the 




( 6 ) 



Hi{x)dx _ J^l^Hv{x)dx 
Hi (x) dx Hy{x)dx 



( 7 ) 
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prior model II for tissue mixtures and the tissue histogram Hj-, the histogram 
Hv in GD can be computed as Hy{v) = '^■HT{ti)II{ti,v), where the case for 
a normally distributed prior with means Hi and variances af for each tissue ti 
implies that II{ti,v) = (1/V¥n^) exp —(v — fii)^ /2af . 

3 Experimental Results 

The set of MRI images used as the input to the algorithms was extracted from 
the Harvard Atlas (S|. The atlas was reformatted into 8-bit 256x123 horizontal 
slices. All voxels not classified as gray matter, white matter, or cerebrospinal 
fluid were given the gray-level value 0. In order to demonstrate the method’s 
robustness to intensity distortions, a second volume was created by applying to 
the atlas a multiplicative low-frequency sinusoidal signal with an amplitude of 
0 . 2 . 




Fig. 3. (a) Result of warping slice 124 (image A) to match slice 129 (image B). (b) 
Inferred global probability map P{Ib\Ia) 



The method was evaluated using slice 124 (image A) of the original atlas and 
slice 129 of the noisy version of the atlas (image B ) — see Fig. El The result of 
deforming slice 124 of image A to match noisy slice 129 of image B is shown in 
Fig.O^., with the inferred global probability map P{Ib\Ia) depicted in Fig.O). 
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The probabilities are displayed in gray scale so that the largest in each column 
appears white. The grid dimensions are proportional to the contribution of each 
intensity value in the image histograms. As can be seen, the deformed image is 
similar to the target image B but correctly exhibits the distribution of intensity 
values found in image A. 

4 Conclusion 

A method is presented for determining the intensity mapping between MRI im- 
ages that may have been acquired using different sequences or scanners. The 
mapping is estimated directly from the image data, explicitly models partial 
volume voxels and spatially varying intensity degradations, and is computed 
jointly with the unknown spatial transformation in an iterative matching al- 
gorithm. The importance of the method is two- fold: it is a tool to model the 
instruments used in the acquisition step so that more effective data processing 
techniques can be developed. For the important problem of image matching, 
the method makes possible a principled approach to likelihood modeling or the 
construction of similarity metrics. A poor model of the intensity mapping for 
the image pair to be registered may lead to false matches, regardless of the prior 
constraints employed and will bias all subsequent morphological analyses. 
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